Security processing engines, circuits and systems and adaptive processes and other processes

ABSTRACT

An electronic circuit ( 200 ) includes one or more programmable control-plane engines ( 410, 460 ) operable to process packet header information and form at least one command, one or more programmable data-plane engines ( 310, 320, 370 ) selectively operable for at least one of a plurality of cryptographic processes selectable in response to the at least one command, and a programmable host processor ( 100 ) coupled to such a data-plane engine ( 310 ) and such a control-plane engine ( 410 ). Other processors, circuits, devices and systems and processes for their operation and manufacture are disclosed.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a divisional of application Ser. No. 15/205,487,filed Jul. 8, 2016, currently pending;

Which was a divisional of prior application Ser. No. 15/045,948, filedFeb. 17, 2016, now U.S. Pat. No. 9,503,265, issued Nov. 22, 2016;

Which was a divisional of prior application Ser. No. 14/712,396, filedMay 14, 2015, now U.S. Pat. No. 9,305,184, issued Apr. 5, 2016;

Which was a divisional of prior application Ser. No. 13/165,190, filedJun. 21, 2011, now U.S. Pat. No. 9,141,831, issued Sep. 22, 2015;

Which is related to provisional U.S. patent application “SecurityProcessing Engines, Circuits and Systems and Adaptive Processes andOther Processes” Ser. No. 61/362,393, (TI-67750PS) filed Jul. 8, 2010,for which priority is claimed under 35 U.S.C. 119(e) and all otherapplicable law, and which is incorporated herein by reference in itsentirety.

And is also related to provisional U.S. patent application “Mode ControlEngine (MCE) For Confidentiality and Other Modes, Circuits andProcesses” Ser. No. 61/362,395, (TI-68484PS) filed Jul. 8, 2010, forwhich priority is claimed under 35 U.S.C. 119(e) and all otherapplicable law, and which is incorporated herein by reference in itsentirety.

This application is related to U.S. Patent Application Publication2004/0025036, “Run-time firmware authentication” dated Feb. 5, 2004,(TI-34918), which is incorporated herein by reference in its entirety.

This application is related to U.S. Patent Application Publication2007/0294496, “Methods, Apparatus, and Systems for Secure Demand Pagingand Other Paging Operations for Processor Devices” dated Dec. 20, 2007,(TI-38213), which is incorporated herein by reference in its entirety.

This application is related to U.S. Patent Application Publication2008/0114993, “Electronic Devices, Information Products, Processes ofManufacture And Apparatus For Enabling Code Decryption in a Secure ModeUsing Decryption Wrappers and Key Programming Applications, and OtherStructures” dated May 15, 2008, (TI-38346), which is incorporated hereinby reference in its entirety.

This application is related to U.S. Patent Application Publication2007/0110053 “Packet Processors and Packet Filter Processes, Circuits,Devices, and Systems”, dated May 17, 2007 (TI-39133), which isincorporated herein by reference in its entirety.

This application is related to U.S. Patent Application Publication2007/0226795, “Virtual Cores and Hardware-Supported HypervisorIntegrated Circuits, Systems, Methods and Processes of Manufacture”dated Sep. 27, 2007 (TI-61985), which is incorporated herein byreference in its entirety.

This application is related to U.S. Patent Application Publication2010/0138857, “Systems and Methods for Processing Data Packets” datedJun. 3, 2010 (TI-63830), which is incorporated herein by reference inits entirety.

This application is related to U.S. Patent Application Publication2010/0322415, “Multilayer Encryption of a Transport Stream Data andModification of a Transport Header” dated Dec. 23, 2010 (TI-63831),which is incorporated herein by reference in its entirety.

This application is related to U.S. patent application Ser. No.12/815,734 “Slice Encoding and Decoding Processors, Circuits, Devices,Systems and Processes” (TI-67049), filed Jun. 15, 2010, which isincorporated herein by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not applicable.

COPYRIGHT NOTIFICATION

Portions of this patent application contain materials that are subjectto copyright protection. The copyright owner has no objection to thefacsimile reproduction by anyone of the patent document, or the patentdisclosure, as it appears in the United States Patent and TrademarkOffice, but otherwise reserves all copyright rights whatsoever.

BACKGROUND

This invention is in the field of information and communications, and ismore specifically directed to improved processes, circuits, devices, andsystems for information and communication processing and/or protectionagainst unauthorized interception of communications, and processes ofoperating, protecting and making them. Without limitation, thebackground is further described in connection with communicationsprocessing and wireless and wireline communications, and securityprocessing.

Wireless communications, of many types, have gained increasingpopularity in recent years. The mobile wireless (cellular) telephone hasbecome ubiquitous around the world. Mobile telephony can communicatevideo and digital data, in addition to voice. Wireless devices, forcommunicating computer data over a wide area network, using mobilewireless telephone channels and techniques are also available. Ethernetand other wireline broadband technologies support many office systemsand home systems.

Wireless data communications in wireless local area networks (WLAN),such as that operating according to the well-known IEEE 802.11 standard,has become especially popular in a wide range of installations, rangingfrom home networks to commercial establishments. Short-range wirelessdata communication according to the Bluetooth technology permitscomputer peripherals to communicate with a personal computer orworkstation within the same room.

Security is essential to protect retail and other commercialtransactions in electronic commerce. Security is vital to protectmedical data, medical records, and other storage and transfer ofpersonal data, or in any context in which personal privacy is desirable.Security is fundamental for both wireline and wireless communicationsand at multiple layers in communications, such as transport layer,network layer, and other layers. Added features and increasing numbersof security standards add further processing tasks to communicationssystems. These potentially involve additional software and hardware insystems that already face cost and power dissipation challenges. Eventhe ability of the system itself to keep up with the task load and rateof information flow may be jeopardized.

Each of the data communication security standards like IPSEC, SRTP, TLS,WiMax, Wireless 3G and Wireless 4G uses its own form of datacryptography and source authentication. (Refer to TABLE 1 Glossary ofacronyms.) To make data communication more secure each security standarddefines its own additional level of processing beyond standardcryptographic algorithmic processing (AES, 3DES, Kasumi etc). Thisadditional processing called “mode operation” is different for eachapplication and different within a given application depending uponcurrent mode of operation and peer capabilities. This mode processing issometimes very complex and calls for repeated cryptographic processingfor a same data block. Some popular examples of the confidentialitymodes that use AES or 3DES cores are CBC, OFB, CFB, CTR, GCM, and CCMwhich may be used in IPSEC applications. To secure wireless datatraffic, transmitted via antenna, Kasumi-F8 and Snow3G-F8 are used in3GPP technology, for a couple of examples.

This cryptographic “mode operation” processing presents a hugetechnological challenge, given that performance and chip area vitallymatter, to support so many different types of processing in hardwareeven though the modes include the basic cryptography AES, 3DES, etc., inthe process. Moreover, as security standards evolve, new modes are addedcontinually to overcome or mitigate security issues as and when found inmode processing, thereby leading to a further problem of technologicallykeeping up with new modes of security processing in hardware.

If system hardware is to support multiple security standards atextremely high processing speeds and transfer rates (called bit-rates),more cryptography standards must be supported with high performance eventhough each standard defines its own data cryptography processes,authentication methods and operational encryption modes.

Hardware implementation of confidentiality modes like CBC, OFB, CFB,CTR, GCM, and CCM, conventionally calls for custom logic for each modeeven when they may use the same cryptographic process (AES, 3DES etc).Performance and chip real estate area suffer. Competitive issues andmarket demands add yet further dimensions of performance, chip area, andQoS (Quality of service) to the challenge of implementing so manysecurity standards. Moreover, as security standards evolve, new modesare invented continually in the industry to overcome or mitigatenewly-detected types of attacks.

Departures for more efficient ways of handling and/or protecting packetand non-packet data, voice, video, and other content are needed formicroprocessors, telecommunications apparatus and computer systems.

SUMMARY OF THE INVENTION

Generally, and in one form of the invention, an electronic circuitincludes one or more programmable control-plane engines operable toprocess packet header information and form at least one command, one ormore programmable data-plane engines selectively operable for at leastone of a plurality of cryptographic processes selectable in response tothe at least one command, and a programmable host processor coupled tosuch a data-plane engine and such a control-plane engine.

Generally, and in another form of the invention, a security contextcache module is for use with a host processor and an external memory.The module includes a local cache memory, a local processor coupled withthe local cache memory, an ingress circuit having an input for ingressof a packet stream including an ingress packet having a security contextpointer, and an auto-fetch circuit responsive to such ingress packet andoperable to automatically fetch a security context from the externalmemory to the local cache memory using the security context pointer, andto associate the security context in the local cache memory with thepacket stream, the auto-fetch circuit operable for multiple such packetstreams and ingress packets, whereby to allow simultaneous securityconnections.

Generally, and in a further form of the invention, a streaming interfacefor packet data includes a buffer circuit for a packet stream includinga packet having an associated request field for thread identification,the buffer circuit operable to provide a ready signal indicating thatthe buffer circuit currently has at least a predetermined amount ofspace to accept data; and a data transfer circuit responsive to therequest for thread identification to transfer data to a particulartarget thread, the data transfer circuit including a control circuitresponsive to the ready signal, and responsive to a start-of-packetindicator and an end-of-packet indicator and a drop-packet indicator,and further responsive to a multi-bit thread identification of a threadthat is currently occupying the buffer circuit.

Generally, yet another form of the invention involves a control methodfor packet processing. The control method includes host-loading a firststorage area with a context including control data and processinginstructions for processing at least part of a packet, supplying astream of packets including a particular packet to a packet processingsubsystem, the particular packet including a pointer to a context in thefirst storage area; operating the packet processing subsystem to accessthe context from the first storage area for use in the packet processingsubsystem in accordance with the pointer, and processing the stream ofpackets in the packet processing subsystem in accordance with thecontrol data and processing instructions in the context.

Generally, another further form of the invention involves an electronicmethod of processing packets. The method includes providing a set ofaccelerator engines and at least one separate control engine, receivingpackets from a stream using an electronic interface, electronicallychunking the packets into chunks in a memory, the chunks being generallyshorter than their packets and at least one of the chunks havingassociated control information, operating the separate control engine inresponse to the control information to electronically generate and storea sequence of engine identifications representing a pipelined process byselected ones of the accelerator engines one after another according tothe sequence; and coupling and operating the accelerator enginesresponsive to the stored sequence of engine identifications so that afirst accelerator engine having the first engine identification in thesequence processes a series of the chunks to produce resulting chunks,and a second accelerator engine having the second engine identificationin the sequence processes the resulting chunks from the firstaccelerator engine beginning substantially as soon as the first of theresulting chunks comes from the first accelerator engine, whereby thestream of packets is pipeline-processed.

Generally, and in still another form of the invention, a packetinterface circuit includes a control circuit operable to receive packetseach having a header and a payload, some of the packets representing afirst stream, and some others of the packets representing a secondstream, the control circuit operable to assign thread identificationsidentifying each such stream, a memory, and a chunking circuit operable,when a given such packet has a payload exceeding a predetermined length,to store chunks in the memory so that the chunks have the predeterminedlength or less, and the chunking circuit operable to load chunk controlinformation into the memory, the control information indicating start ofpacket (SOP), middle of packet (MOP), and end of packet (EOP), dependingon the position in the payload of data in a given stored chunk.

Generally, a further process form of the invention involves acommunication method for control communication between processors. Thecommunication method includes electronically breaking ingress packetsinto smaller chunks, one of the chunks for a packet being astart-of-packet chunk having associated control information, operatingone or more programmable control-plane engines to process such a startof packet chunk and form at least one command to organize a set of dataplane engines into a particular pipeline topology, and selectivelyoperating the data-plane engines programmably to process the chunks inaccordance with the command, whereby to effectuate at least one of aplurality of packet processing modes.

Generally, and in a yet further form of the invention, an electronicbuffering circuit includes at least three processors each having inputsand outputs and identified by respective engine identifications, and atleast one of the processors operable to generate particular engineidentifications of at least two of the processors; a plurality ofbuffers at least equal in number to the plurality of processors; and aselection circuit responsive to controls based on the engineidentifications of the processors for any-order interconnection of aselected processor-buffer-processor topology.

Generally, and in another additional form of the invention, apacket-processing electronic subsystem includes a first data interfacefor first streaming data, a second data interface for second streamingdata, a scheduler circuit coupled to the first and second datainterfaces and including a packet memory, a security context cachemodule coupled for input from, and output to, the scheduler circuit, thesecurity context cache module including a cache controller and a cachestorage for at least one security context, a packet header processingmodule coupled for input from, and output to, the scheduler circuit, anauthentication module coupled for input from, and output to, thescheduler circuit; and an encryption module coupled for input from, andoutput to, the scheduler circuit and the encryption module includingcontrol circuitry and encryption accelerators responsive to a securitycontext in the security context cache module to operate the encryptionmodule and the authentication module as specified by the securitycontext and the packet header processing module.

Other processors, circuits, devices and systems and processes for theiroperation and manufacture are disclosed and claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an inventive subsystem for efficientcryptographic acceleration.

FIG. 1A is a four-quadrant diagram of processing parallelism of theinventive subsystem of FIG. 1, e.g., in Internet and wireless, and in acontrol plane and a data plane.

FIG. 2 is a diagram of memory spaces related by pointers, and includinginventive security data structures acting as a data sink receive queueat top and data source transmit queue at bottom which are established orsupported by a host processor and the subsystem embodiment of FIG. 1.

FIG. 3 is a composite diagram of packets and descriptors thereintogether with storage spaces for a security context and data bufferspace(s) used for the inventive security data structures of FIG. 2.

FIG. 4 is a partially block, partially flow, diagram wherein theinventive subsystem of FIG. 1 adaptively organizes a programmablestructure called a logical topology for IPSEC outbound and IPSEC inboundpackets.

FIG. 5 is a partially block, partially flow, diagram wherein theinventive subsystem of FIG. 1 adaptively organizes a programmablestructure (logical topology) for Air cipher/Stream cipher.

FIG. 6 is a storage space diagram of an inventive security context datastructure for IPSEC in ESP mode, such as to support FIG. 4, oralternatively with an inventive security context for SRTP.

FIG. 7 is a storage space diagram of an inventive security context datastructure that supports FIG. 5 for Air Cipher inbound and outbound.Notice that the order of integrity and encryption in the securitycontext is reversed in this example depending on the Outbound or Inboundoperation.

FIG. 8 is a block diagram of an inventive security context cache for thesecurity context data structures such as those of FIGS. 6 and 7.

FIG. 9 is a storage space diagram of an inventive internal buffer formatwith associated buffer pointer positions and that is provided for aninventive process of chunking packets.

FIG. 10 is a block diagram of an inventive encryption module in thesubsystem embodiment of FIG. 1.

FIG. 11 is a block diagram of an inventive mode control engine (MCE) foruse in the encryption module of FIG. 10 and in the Air Cipher module ofFIG. 14.

FIG. 12 is a block diagram of an inventive authentication module in thesubsystem embodiment of FIG. 1.

FIG. 13 is a block diagram of an inventive packet header processing(PHP) module in the subsystem embodiment of FIG. 1.

FIG. 14 is a block diagram of an inventive Air Cipher module in thesubsystem of FIG. 1 and that uses the MCE embodiment of FIG. 11.

FIG. 15 is a flow diagram of an inventive process for initialization ofthe subsystem of FIG. 1.

FIG. 16 is a flow diagram of an inventive process for setting up asecurity context for the subsystem embodiment of FIG. 1.

FIG. 17 is a flow diagram of an inventive process for tearing down asecurity context for the subsystem embodiment of FIG. 1.

FIG. 18 is a flow diagram of an inventive process for evicting asecurity context in FIGS. 1 and 8 for the subsystem embodiment of FIG.1.

FIG. 19 is a flow diagram of an inventive process for issuing Engine IDsfor multiple execution passes for the subsystem embodiment of FIG. 1.

FIG. 20 is a block diagram of an inventive secure telecommunication andprocessing system combination with structures and processes as disclosedherein.

FIG. 21 is a flow diagram of a process for inventive mode processingcode assembly for the FIG. 11 MCE embodiment.

FIG. 22 is a flow diagram of a process for inventive mode processing inFIGS. 1 and 11 by the MCE embodiment according to assembly codegenerated according to FIG. 21.

Corresponding numerals in different Figures indicate corresponding partsexcept where the context indicates otherwise. A minor variation incapitalization or punctuation or spacing, or lack thereof, for the samething does not necessarily indicate a different thing. A suffix .i or .jrefers to any of several numerically suffixed elements having the sameprefix.

DETAILED DESCRIPTION OF EMBODIMENTS

To solve the above noted problems and other problems, smart, scalablehigh performance, configurable cryptographic engines (occasionallyreferred to as CP_ACE herein) provide an example of a remarkable,adaptive subsystem category of embodiments, allowing multiple securitystandards like IPSEC, SRTP, TLS, WiMax, wireless 3G and wireless 4G tobe processed concurrently and efficiently using the same processingengines. The subsystem embodiment of FIG. 1 is adaptive, adapted, oradaptable by allowing firmware-controlled security header processing andhardware-driven, any-order data staging, cipher block formatting andcryptographic processing.

Such subsystem embodiments can satisfy extremely high bit-rate demandsand provide a rich feature set to accommodate industry cryptographystandards to carry out content encryption and authenticity validationfor wire-side and wireless-side traffic. Moreover, these embodiments canprovide anti-replay protection and resist other types of securityattacks.

A form of the subsystem employs multiple engines that primarily processstreams of data and controllably separates or segregates them from oneor more additional engines that primarily perform control functions andresponses to conditions—thereby establishing a data plane and a controlplane herein. The separation desirably avoids or obviates blockingeffects that might otherwise arise between control plane processing anddata plane processing, while the control plane schedules and otherwisecontrols the data plane. The separable data planes and their independentcontrol avoid stalling of either plane by the other plane. A hostprocessor is also provided that can call the subsystem and further isfree to itself selectively use the data plane and bypass the controlplane, e.g. without engaging control plane components. Two-way registeraccess between control plane and data plane promotes monitoring, controlof blocks and their topology, and controllable separation. Thecut-through structure separates the data plane from the control plane,or generally provides a parallel control information transfer path inone circuit half or control plane as compared with a data transfer pathin another circuit half or data plane for true pipelined processing.That way, no stall arises even if delays occur in either the controlplane or data plane.

The subsystem preserves and enhances Quality of Service (QoS) byautomatically breaking a data packet into small chunks and schedulingthese data chunks based on configured or requested QoS level. Such QoSlevel indicates or represents packet stream priority and is used by thesubsystem to control and/or establish subsystem latency (packetthroughput delay) and data rate, for instance. This important ability toswitch within-a-packet allows QoS preference, such as to give higherpriority to packets of another type or QoS level, to be effectiveimmediately. Some modes or packet types may automatically have aparticular QoS level associated with them in the configuration.

The high performance, adaptive, and configurable cut-through embodimentswith internal data chunking allow multiple security standards to beprocessed concurrently at high bit rate and low latency. Mere updates tofirmware for the subsystem confer the ability to support new standardsin the field. Such subsystem processes packets in data chunks therebygiving ability to switch within-a-packet to a new higher prioritypacket, thereby preserving and enhancing Quality of Service.

The subsystem of FIG. 1 hosts a security context cache module of FIG. 8that fetches and evicts a respective control data structure for eachsecurity context that holds information like cryptographic keys andmodes from external memory 120 on demand basis. This information isfanned out to the data processing engine(s) automatically by hardwarebefore data is processed. Optionally the control structure itself can beencrypted to safeguard access to keys in external memory 120. Arbitratedport controllers are coupled to a data lookup cache portion and to asecurity context cache portion, further effectuating the parallelism ofcontrol plane and data plane in the cache structure of such subsystemembodiments.

The subsystem circuitry partially constructs a security context as acontrol plane operation in the local context cache store by an access tohost memory. The circuitry also acts to process an incoming packet intopacket chunks each including a portion of data from an incoming packetand to affix control information into at least one such chunk. Thesubsystem provides a further contribution to the construction of thesecurity context in the local context store from the control informationin the packet chunk in the data plane. Interlocked security is thusflexibly provided by operations in both the control plane and the dataplane.

Moreover, the subsystem introduces both control plane/data planeparallelism and cryptographic parallelism such as for Internet andwireless concurrently. This constitutes a two-dimensional streamingparallelism in four quadrants (see FIG. 1A) for control/data andInternet/wireless cryptographic and other processing that can withdramatic efficiency securely handle the real-life applications thatusers care about now and in the future.

Cryptography processing is conventionally very expensive and burdensomeon a main CPU (or array of CPU's) at least because new securitystandards require more data and instruction bandwidth and processing inconjunction with the high incoming packet rate. The subsystemembodiments described herein offer tremendous advantage since theprocess of operation offloads data security related processing from themain CPU (host processor or array, see FIG. 20) and at the same timesupports multiple security standards at high performance. The subsystemalso provides a direct mode in which one or more such CPUs can directlyengage hardware cryptographic cores to process non-packet (non-standard)data.

The field of Cryptography processing has numerous acronyms, and TABLE 1Aprovides a Glossary for some of them. TABLE 1A also illustrates thediversely extensive numerousness of these processing operations demandedfor execution at high rates.

TABLE 1 GLOSSARY OF CRYPTOGRAPHY AND COMMUNICATIONS Acronym DescriptionAAD Additional authenticated data (for Galois) AES Advanced EncryptionStandard AES-CMAC Advanced Encryption Standard Cipher-based MessageAuthentic'n Code Air Cipher Cipher to protect wireless over-the-aircommunications AH Authentication Header, part of IPSEC CBC* Cipher BlockChaining CBC-MAC Cipher Block Chaining - Message Authentication CodeCCM* Counter with CBC-MAC CFB* Cipher Feedback Cipher Procedure forperforming encryption or decryption CTR* Counter. An encryption mode.DES Data Encryption Standard DFC Decorrelated Fast Cipher DSL DigitalSubscriber Line, type of wired network over telephone line ECBElectronic Code Book Ethernet Type of wired network using cabling onpremises between computers ESP Encapsulating Security Payload, part ofIPSEC packet protection A5/3 GSM key stream generator F8# Aconfidentiality process in UMTS, uses Kasumi F9# An integrity process inUMTS, uses Kasumi ECB Electronic Code Book FIPS Federal InformationProcessing System GCM* Galois Counter Mode GMAC Galois MessageAuthentication Code GPRS General packet radio service. A wirelessstandard. HMAC Hashed Message Authentication Code IETF InternetEngineering Task Force IKE Internet Key Exchange IPSEC Internet ProtocolSecurity Kasumi Block cipher in UMTS, GSM, GPRS. An Air Cipher. LANLocal Area Network MACSEC Media Access Control Security, IEEE 802.1AE,e.g., for Ethernet MD5 Message Digest 5 NIST National Institute ofStandards and Technology OFB* Output Feedback. An encryption mode. RFCRequest for Comment SHA Secure Hash Algorithm SnowSG Word-orientedstream cipher, an Air Cipher SRTP Secure Real-time Transport protocol SSSubscriber Station SSL Secure Socket Layer TLS Transport Layer SecurityUMTS Universal Mobile Telecommunications System. A wireless standard.WLAN Wireless Local Area Network 3DES Triple DES 3GPP 3rd GenerationPartnership Project *Examples of confidentiality modes that use AES or3DES cores are CBC, OFB, CFB, CTR, GCM, and CCM, which are used in IPSECapplications. #Kasumi-F8 and Snow3G-F8 are used in 3GPP technology tosecure data traffic transmitted via antenna, hence the phrase Air Cipherherein.

TABLE 1B provides another Glossary for acronyms used to describe theembodiments.

TABLE 1B GLOSSARY OF BLOCKS AND DATA STRUCTURES Acronym Description CDMACPPI DMA controller (distinct from wireless CDMA next) CDMA Wirelesscode division multiplex for telecom CMD Command CPPI CommunicationProcessor Peripheral Interface CP_ACE Accelerated Cryptographic Engine.Subsystem example of embodiment. CTR Counter CTX Context ctxcach ContextCache DDR Double Date Rate, type of RAM DMA Direct Memory Access,peripheral circuit EMIF External Memory Interface EOP End of Packet FWFirmware, e.g. software stored in flash non-volatile memory. HFNHyperframe Number HW Hardware IV Initialization Vector, for keyderivation LSB Least Significant Bit MCE Mode control engine, anothertype of embodiment MMR Memory Mapped Register MOP Middle of Packet MSBMost Significant Bit PA Packet accelerator PHP Packet header processorPDSP Packed Data Structure Processor, another type of embodiment:programmable engine for parsing a packet header, trailer, and payloadPKA Public Key Accelerator RAM Random Access Memory RISC ReducedInstruction Set Computing or Computer RNG Random Number Generator ROCRollover Counter SC Pointer Security context pointer holding datastructure in host memory SCCTL Security context control word, TABLES 19,10. SCID Security context ID SCIDX Security index SCPTR Security contextpointer SOP Start Of Packet sw Software or firmware SW Software WordVBUSP VBUS Protocol bus signaling protocol

Embodiments exemplified by the subsystems described at length herein areflexible and adaptive thereby allowing new security standards andapplication-specific encryption operational modes to be updated in thefield. Various embodiments provide a high performance, loosely coupledpacket engine to encrypt, decrypt and authenticate data on-the-flythereby maintaining a suitably-specified wire-rate or wireless rate, andto perform a threshold level of security monitoring on inbound trafficto provide sanity and integrity checks to protect host processor 100from unwanted traffic. Minimal intervention from host 100 is involved toprocess data, but at same time the host 100 is fully in control of suchprocessing. The subsystem can cache high-speed connections keys andcontrol, thereby promoting efficient high speed execution. Auto-fetchkeys and control structures from host memory are provided in securefashion as and when appropriate, so that the system is secure whencaching high-speed connections keys and control. Some embodimentsprovide direct cryptographic processing acceleration to host 100 toencrypt/authenticate raw data (non-packet), especially for multi-mediaapplications.

A public key accelerator (PKA) aids host 100 for keygeneration/derivation mainly for IKE and other similar processes. Anon-deterministic true random number generator (TRNG) is provided and ishost-accessible. A high performance, link-list based, descriptor-drivenscatter-gather CPPI DMA (direct memory access) can queue packets.Firmware is updatable in the field to enhance/support new processingfeatures such as new header processing features and other features.

The system has a remarkable structure and process to updatemicro-instructions in the field to support new encryption operationmodes like CCM etc.

High Level protocols supported include 1) transport mode for both AH andESP processing for IPSEC protocol stack, 2) tunnel mode for both AH andESP processing, 3) full header parsing and padding checks, 4) Constructinitialization vector IV from header, 5) anti-replay attack resistance,6) SRTP protocol stack to support F8 mode of processing and replayprotection, 7) WiMax encryption, 8) 3GPP protocol stack, 9) Wireless Aircipher standard, 10) A5/3 mode, 11) firmware enhancements for SSL andMACSEC.

TABLE 2 PERFORMANCE EXAMPLE Protocol Mbits/sec IPSEC − ESP 1400 IPSEC −AH 1400 3GPP 400 SRTP 400 Legal co-existence IPSec + SRTP 1800 (Total)IPSec + 3GPP 1800 (Total)

In the keys and control structure, host 100 forms a security contextunder which the hardware encrypts and decrypts keys, providesconnection-specific control flags, anti-replay windows, and firmwareparameters, and establishes static connection values such as a nonce ora salt. (A nonce is a security string or number used once. A salt is arandom value input used along with a password in key derivation.)

The system in one example supports up to 32,768 (or 2¹⁵) simultaneousconnections or more. Setup is as easy as sending packet pertaining tothat connection. Host 100 can lock high-speed connections. Anyconnection can be smoothly torn down.

A control structure is auto-fetched on a demand basis, as and whenrequested, to cache up to 64 security contexts or more. A securitycontext is cached permanently if locked by host 100. Also, host 100 isoperable to automatically evict old connections to make room for newconnections.

Some embodiments secure the security context itself, and/or fetch theconnection in secure mode using secure infrastructure.

In FIG. 1, hardware 200 in one embodiment has a Two-Plane architectureherein including a data plane 300 and a control plane 400. The dataplane 300 supports cryptographic payload processing by providing andutilizing modules for authentication processing 320, encryptionprocessing 310, air ciphering 370, public key acceleration PKA, and atrue random number generator TRNG. Further, as shown in FIGS. 1, 10, 11,12, 13, and 14, the planes cut through each of the just-noted modulesand the packet header processing PHP modules 410 and 460. The data planeinvolves the blocks or sub-blocks primarily involved with handlingpackets “p” (packet data). The control plane involves the sub-blocksprimarily involved with handling packets “c” for control data, packets“_” (unmarked) for scheduler data, and packets “f” for configurationdata. PKA and TRNG by having lines marked “f” represent a slight legendexception to the foregoing generalization, and PKA and TRNG partake ofdata-plane. The basic structure and benefits of the distinction betweenplanes are nonetheless consistent throughout.

The control plane includes one or more packet header processing PHPmodules and provides Ingress header checks and Egress header updating.The special CPPI IO's along with these data and control planes provide ahigh-performance streaming interface.

In both control plane and data plane, shared hardware crypto corehardware is provided for IPSec, SRTP and Transport layer, thereby savingintegrated circuit real estate expense. The architecture segregates thedata plane 300 from the control plane 400 (or generally provides aparallel control information transfer path in upper half as comparedwith data path in lower half in FIGS. 10, 11, 12, 13, and 14) for truepipelined processing, so no stall arises even if delays occur in eitherthe control plane or data plane. The fully-pipelined engine, orstructure e.g. of FIG. 1, supports Encryption and Authenticationsimultaneously, and also provides any-order staging, such as AESfollowed by SHA or SHA followed by AES, or AES1 followed by AES2, forsome examples.

In data plane 300 (or cut-through data-related portion in FIGS. 10, 11,12, 13, and 14), a cryptographic payload processing module providesauthentication 320 processing for SHA1 and AES (used for authenticationtoo), MD5, and SHA2, for instance. Keyed HMAC (Hashed MessageAuthentication Code) operation via hardware core using MD5, SHA1,SHA2-224 and SHA2-256, and support for truncated authentication tag areincluded.

Block data encryption is supported via respective hardware cores forprocessing AES, DES, 3DES, and Galois multiplier, see module 310.Supported Air Ciphers include Kasumi and Snow3G for stream dataencryption, see module 370. Security context architecture has on-chipcache (FIG. 8) with auto-fetch and can cache 64 contexts and auto-Evictor auto-Fetch a Security context on a demand basis. A Public KeyAccelerator module includes a high performance, public key engine forlarge vector math operation and supports a modulus size up to 4096-bitsor more for public key computations. Further, the Cryptographic Payloadprocessing module(s) in the data plane has a True Random NumberGenerator TRNG, is non-deterministic and FIPS compliant. Null cipher andnull authentication support debugging.

Further in FIG. 1, the independent control plane and data planearchitecture allows host 100 to selectively use only data plane 300components while bypassing the control plane 400. In a cut-through modeof operation, packets are processed as and when received, withoutwaiting for the complete packet to finish. Packets are processed inchunks thereby ensuring that all the hardware engines are fully engaged.The context cache module 510 is coupled for auto fetch of securitycontext based on current state of an engine, and pre-fetch securitycontext is based on information available from an ingress FIFO. Anoption allows storage of security context within an engine for highperformance connections. Auto-eviction of security context is based onunavailability of space within the context cache in FIG. 8. Fullypipelined engines for parallel processing allow multiple processing on asame payload by auto-forwarding to next engine.

To avoid limitlessly accumulating mode-specific hardware cores formultiple modes like CBC, OFB, CFB, CTR, GCM, CCM and other modes, aremarkable programmable Mode Control Engine MCE of FIG. 11 hereinsequences various logical and arithmetic operations and otherinstructions to achieve each desired encryption/authenticationoperational mode and leverage the speed of associated hardware cryptocores. The sequence of operations is contained in a set of instructionsthat are stored as part of the security context in the memory. MCE alsohas registers (e.g., four registers each 128-bit such as in its RegisterBank) to store the immediate result after each operation. In addition,the security context of FIG. 2 in memory stores encryption andauthentication key and some other security parameters such as InitialVector (IV), encryption mode, authentication tag length and location,date offset and security process details. Many of the MCE instructionsas in TABLE 13 are also specifically set up to have direct access tothese parameters.

In the control plane 400 (or cut-through control-related portion inFIGS. 10, 11, 12, 13, and 14), a cryptographic control plane processingmodule includes two instances 410, 460 of a PHP (Packet headerprocessor) of FIG. 13 and has a 32-bit Low gate count RISC CPU (PDSP)header processing engine for programmable protocol-related packet headerand trailer and payload parsing for true 64K bytes packet processing,padding checks, security procedure control and decode, 16K ofinstruction RAM, and 8K scratch-pad RAM in one set of RAM sizeparameters for an implementation example. A hardware-acceleratedsecurity context viewer module is provided, as well as ahardware-accelerated packet viewer module. A special data enginedesignated CDE is beneficial for packet type application and allowshardware accelerated bytes insertion and removal from any packet.

Software and firmware architecture includes firmware for IPSEC, firmwarefor SRTP, and firmware for 3GPP, and firmware that schedules theprocessing for the hardware engines. A driver layer is provided.

In FIGS. 1 and 2, the subsystem 200 includes and uses FIFOs and CDMA(CPPI DMA Communication Processor Peripheral Interface direct memoryaccess controller) to fetch packet descriptors and buffers contents froma system 3500 such as in FIG. 20. In FIG. 2, subsystem 200 (3540)maintains a receive queue (Queue X) for ingress for the securityaccelerator. Receive queue holds one or more Host Packet Descriptorsthat each have 1) a handle to access a security context buffer in aprotocol-specific part, and 2) a pointer to a Data Buffer for data orfrom which to access data. The security context (SC) buffer holdssecurity context information that is collectively called a SecurityContext. The Security Context includes information such as encryptionand authentication key, initialization vector (IV), encryption mode,authentication tag length and location, and data offset and othersecurity process details. The Data Buffer holds SOP (start of packet),EOP (end of packet), and a block of data to be cryptographicallyprocessed such as by encryption, decryption, authentication, orotherwise according to the encryption mode.

In FIG. 2, subsystem 200 also maintains a transmit queue (Queue Y) usedfor egress with the security context, and multiple transmit queues areestablished for multiple concurrent security contexts. A transmit queueholds a Host Packet Descriptor that contains 1) a handle to access thesecurity context buffer or an output security context buffer in theprotocol-specific part, 2) a pointer to the Data Buffer or to an outputdata buffer from which to access data, and 3) extended packetinformation, such as to indicate whether the security context has beenupdated. The security context (SC) buffer for transmit purposes not onlyholds the Security Context as already described but also any updated ROC(rollover count), HFN (hyperframe number), etc. The Data Buffer fortransmit purposes holds SOP (start of packet), EOP (end of packet), anda block of output data resulting from the cryptographic processing.

In FIG. 3, receive operations relate to that receive queue of FIG. 2 andinvolve ingress of a series of packets each having a plaintext PDU(Protocol Data Unit) header and packet payload data arriving forcryptographic processing. Host Packet Descriptors correspond to thepackets and have a pointer that points to the data buffer block of datato be decrypted or encrypted. Such Host Packet Descriptor has one ormore protocol-specific fields that point to the Security Context orfields therein. These receive operations also relate to the chunking ofthe packets by subsystem 200, i.e. breaking a data packet on ingressinto smaller data chunks.

In FIG. 4, the subsystem 200 of FIG. 1 adaptively organizes aprogrammable structure called a logical topology for IPSEC outbound andIPSEC inbound packets using its IPSEC PHP 410 in FIG. 1. (See FIG. 13for a PHP detail that is used in each of the IPSEC PHP and Air CipherPHP and that uses a processor PDSP.) For IPSEC outbound packets, firstpass packet header processing by IPSEC PHP 410 is followed in FIG. 4 byEncryption SS 310, then Authentication SS 320, and then IPSEC PHP pass 2processing. See also the associated security context of FIG. 6 and FIG.3. For IPSEC inbound packets, first pass packet header processing byIPSEC PHP 410 is followed in FIG. 4 by Authentication SS 320, thenEncryption SS 310 (decryption), and then IPSEC PHP pass 2 processing. Ifone IPSEC packet stream is outbound while another IPSEC packet stream isinbound, then both forms of processing in FIG. 4 can be set up andexecuted concurrently. Buffering 250.i supports the logical topology,such as cascade or serial nature of the outbound and inbound processes.Indeed, the subsystem of FIG. 1 not only effectively supports either ofthose FIG. 4 processes individually but also is or can be relativelyevenly loaded while supporting both of those FIG. 4 processesconcurrently. This is because the chunks (FIG. 9) are likely to be ofsimilar size, and the differing order of operations for outbound andinbound readily have a FIG. 1 encryption block 310 running for outboundwhile an authentication block 320 is running for inbound, and viceversa. Notice that the buffers 250.i in FIG. 4 are some of the FIFObuffers at the inputs of Crypto Data and Scheduler SCR 260 of FIG. 1 andany buffers in the blocks or modules themselves. Under the configured orprogrammably established logical topology, those buffers of FIG. 1 arere-arranged or selectively multiplexed into whatever operational order(such as in the examples of FIGS. 4 and 5) is specified to establish aparticular currently-employed process or future process. These processescan be in one security context or in plural security contexts such asrepresented by any one or more of various forms of FIGS. 6 and 7 andFIGS. 2 and 3.

When FIG. 1 is considered in light of FIGS. 4 and 5, the logicaltopologies of FIGS. 4 and 5 or otherwise, are recognized as variousprogrammably-helical paths (involving what are called “rounds” herein)that can be established adaptively in and in a sense form the structureof FIG. 1 into one or more coils (rounds) mediated by the Crypto Dataand Scheduler SCR 260. Depending on context, the term “round” may alsorefer to a sequence of operations cycling through a same given subset ofthe modules among modules 410, 460, 310, 320, 370 and buffers 250.i.Notice the compatible lines for control plane and data plane throughoutFIGS. 10-14. Multiple packet flows streaming into the PA and CDMAIngress CPPI Streaming Interfaces are coiled at any given moment intological topologies of approximately concurrent data flow and processing,and output data streams emerge out of the PA and CDMA Egress CPPIStreaming Interfaces. The various modules that concurrently participatein the different coils (rounds), and in what order for each coil(round), are established according to the Security Context Cacheinformation and the Configuration SCR information. The operations of themodules are sequenced in a given coil (round). These operations appearto alternate or form other remarkable patterns of operation in space andtime, as the remarkable CP_ACE subsystem 200 is configured and calledand does its work.

In FIG. 5, the inventive CP_ACE subsystem 200 of FIG. 1 adaptivelyorganizes a programmable structure (logical topology) for Aircipher/Stream cipher. For Air Cipher, first pass packet headerprocessing by PDSP of FIGS. 1, 13 Air Cipher PHP 460 is followed in FIG.5 by Air Cipher SS processing in the separate Air Cipher module 370 ofFIGS. 1 and 14, and then further followed by FIG. 5 Air Cipher PHP 460pass 2 processing. See also the associated security context of FIG. 7and FIG. 3 and TABLE 5. Concurrently or otherwise for Stream cipher,first pass packet header processing by PHP 460 is followed in FIG. 5 byStream Cipher SS in module 370, and then PHP 460 pass 2 processing.Buffering 250.i again supports the logical cascade or serial nature ofthese parallel processes so that the subsystem of FIG. 1 is relativelyevenly loaded.

Notice that the logical topologies of both FIGS. 4 and 5 can be executedconcurrently due to the additional level of parallelism of the subsystem200. Accordingly, not only can subsystem 200 be characterized by controlplane/data plane parallelism but also cryptographic parallelism such asillustrated for supporting Internet and wireless concurrently. Subsystem200 embodiments thus also remarkably introduce a two-dimensionalparallelism in four quadrants for control/data and Internet/wirelesscryptographic and other processing to which the advantages commend them.

As illustrated by examples of FIG. 6 and FIG. 7, each individualsecurity context per-connection accessible via Ctx Fetch VBUSP in FIG. 1Host memory 120 (3520.3, 3550 in FIG. 20) is made up of three parts:Software-only section, PHP section, and data plane processing section.The Software only section holds the information that is used by software(DSP code) for managing security context and for storingconnection-specific data, and this information does not need to befetched by CP_ACE subsystem 200. The PHP section in FIG. 6 or 7 holdsPHP control information used by each packet header processing (PHP)module 410 or 460 in subsystem 200 to maintain the current state of theconnection along with data used to process packets. This PHP section inFIG. 6 or 7 is fetched and updated as needed using DMA 520 of FIG. 8.The third and fourth sections in FIG. 6 or 7 hold data plane processing(Encryption, Authentication, and/or Air Cipher) module-specific controland state information fetched by subsystem 200 as needed. Subsystem 200does not need to write/update these data plane processing subsystemsections. To maximize the EMIF (external memory interface) efficiency,each FIG. 6 section starts at a 64-bytes aligned address, for instance.Hardware control structure is aligned to 64-bytes to allow cascading ofmultiple control structures.

In FIG. 6, a security context example is shown for IPSEC or SRTP in ESPmode as seen by DSP software. This context uses Authentication (SHA/MD5)and Encryption (AES/3DES). This flow is same for both Inbound andOutbound. A Host pointer points to a 64-bytes Software-only section thatis not fetched by CP_ACE. The SCPTR pointer of TABLE 10 points to asection in FIG. 6 that has SCCTL (8-bytes), a Packet Header processor(PHP) module-specific section, followed by an encryption module-specificsection, and further followed by an Authentication module-specificsection. The 56-bytes Packet Header processor (PHP) module specificsection is fetched by subsystem 200 and used for IPSEC header processingusing PDSP and CDE engine and PHP Pass1/Pass2 Engine ID (TABLE 5). The96-bytes Encryption module-specific section is fetched by subsystem 200and used for IPSEC encryption using AES/3DES core and Encryption Pass1Engine ID. See discussion of FIG. 10 and TABLES 11-12 later hereinbelow.The 96-bytes authentication module-specific section is also fetched byCP_ACE and used for IPSEC Authentication using SHA/MD5 core andAuthentication Pass1 Engine ID. See discussion of FIG. 12 and TABLE 15also.

In FIG. 6, for SRTP, the three module-specific sections are used in thesame way but have different numbers of bytes than used for IPSEC. Thus,multiple modes for IPSEC and for SRTP respectively are analogouslysupported by the same FIG. 1 hardware for PHP, encryption, andauthentication.

In FIG. 7, another example of security context is provided for Aircipher Outbound, where encryption (Kasumi-F8) is done first, followed byAuthentication (Kasumi-F9). In this case a same hardware engine is usedtwice. The order of Authentication/Encryption sections is beneficiallyreversed in FIG. 7 for Air Cipher Inbound. A 56-bytes Packet Headerprocessor (Air Cipher PHP) module-specific section is fetched bysubsystem 200 and used for Air cipher header processing using PDSP andCDE engine and PHP Pass1/Pass2 Engine ID (TABLE 5). A 64-bytes Aircipher module specific section is fetched by subsystem 200 and used forAir cipher encryption using Kasumi/AES/Snow3G core (e.g., Kasumi-F8) andAir Cipher Pass1 Engine ID. A second 64 bytes Air cipher module-specificsection is also fetched by CP_ACE and used for Air cipher integrityprotection using Kasumi/AES/Snow3G core (e.g., Kasumi-F9) and Air CipherPass2 Engine ID. See discussion of FIG. 14 and TABLES 16-17 also.

FIG. 7 also is re-used as a Figure to show a separate example ofsecurity context (separately-stored in memory) for Air cipher Inbound,where Authentication (Kasumi-F9) is done first, and followed byEncryption (Kasumi-F8). In this case a same hardware engine is usedtwice. The order of Authentication/Encryption sections is reversed forAir Cipher Inbound relative to Air Cipher Outbound. In this way, twodifferent Air cipher modes are supported, depending on the configurationor loading of the security context.

In FIG. 7, yet another security context applies analogously to CCM forInbound or Outbound modes. The control bits track those of the AirCipher description by analogy, except that for CCM an AES/3DES core isspecified.

FIG. 8 shows a block diagram for the security context cache module 510that is coupled to context RAM 570 in subsystem 200 of FIG. 1, and theblock diagram also illustrates a flow of the security context cacheworking process. In FIG. 8, the security context cache 510 has a DMAmodule 520 that interfaces with the context RAM 570 and couples to amaster interface with context fetch bus VBUSP to access securitycontexts (as in FIG. 2, 3, 6 or 7) in host memory 120 in FIG. 1. Thisportion operates as a control-plane structure. DMA 520 is operable forfetch and eviction operations with context RAM 570. A lookup module 530interfaces with a storage called Lookup RAM for data read/write. Suchstorage is suitably provided in the context RAM 575 space in FIG. 1.Note also the FIG. 1 parallel buffers 250.5 and 250.15 which can becoupled to modules 520, 530 in FIG. 8 directly or multiplexer-coupledinto the cache structure. Thus the cut-through organization is carriedconsistently into the cache structure.

Cache module 510 in FIG. 8 has three cache port controllers: 1) PA CPPIport controller 540, 2) CDMA CPPI port controller 550, and 3) MMR portcontroller 560. Arbitration logic 580 supports lookup module 530 byarbitrating any lookup contention for module 530 as between any of theport controllers 540, 550, 560. Arbitration logic 590 supportsevict/fetch DMA module 520 by arbitrating any contention for DMA 520 asbetween any of the port controllers 540, 550, 560. Each of these threeport controllers has a set of three control lines with a port prefixfollowed by _Lookup_Req to activate a lookup request, _EOP_Req toactivate an end of packet request, and _Schd_Req to return a schedulingresponse output. (FIG. 1 shows these control lines in abbreviated mannersimply by lines 262, 263 coupling crypto data and scheduler SCR 260 withsecurity context cache module 510.) Each triplet of these control linesis designated by a prefix PA, CDMA, or MMR to indicate that it iscoupled to PA CPPI, CDMA CPPI, or MMR block in FIG. 1. Each of the threeport controllers 540, 550, 560 has two output lines to convey requeststo lookup arbitration 580 and DMA arbitration 590. See, among othercontrols descriptions elsewhere herein: For PA, see TABLES 26, 28. ForCDMA, see TABLES 25, 27. For MMR, see TABLES 21-24. For security contextcache operations pertaining to setting up, tearing down, and evicting asecurity context, see FIGS. 16-18 and TABLE 9.

Turning to FIG. 9, an internal buffer format is depicted. A packet asreceived from CPPI 210 or 220 as part of ingress flow is chunked intosmaller data blocks within subsystem 200 and packed into the buffer,e.g. 265, with the illustrated format. All of the data processingengines in FIG. 1 use and operate on the basis of this FIG. 9 format toaccess data for their respective processing. Packet data start positionis variable and dependent upon length of the CPPI Pre-data Control wordssection in FIG. 9. If no CPPI Pre-data Control words are present, thenpacket data starts at offset of 64-bytes. In this example, CPPI Pre-dataControl words as formed by Host 100 or PDSP software are 8-bytesaligned. Padding of zeroes is executed, if need be, to achieve 8-bytesalignment.

In FIG. 9, this internal buffer format or chunk buffer begins at apointer address Buf_Ptr with a Descriptor area (e.g., 24 bytes). Referalso to FIGS. 2 and 3 Host Packet Descriptor discussion. Descriptor areais followed by a SW word area (e.g. 8 bytes, see also TABLE 3 and SW0,SW1). Trailer information called the PS word (32 bytes) and thenup-to-128 bytes CPPI pre-data control words such as Command label(s) arenext in succession. Then follow a Front Packet Grow region (32 bytes),an up-to-256 bytes chunk of variable length packet data, and a RearPacket Grow region (32 bytes). (All of the numbers of bytes representnon-limiting examples.)

Each Grow region provides a guard band of buffer space. The Front PacketGrow region provides a degree of protection of CPPI Pre-data ControlWords (e.g., Command label(s)) from an error or attack involving thePacket data section in FIG. 9. The Rear Packet Grow region provides adegree of protection of an adjacent chunk buffer space (beyond FIG. 9)from an error or attack that might affect or run-on the size of thePacket data section.

Returning to FIG. 1, data processing engines and security contexts arefurther detailed. The letter-code legends for lines used in FIG. 1 andFIGS. 10 and 12-14 are:

p=Packet Datac=Context Dataf=Configuration Data(none)=Scheduler Data.

In FIG. 1, the data planes and their independent control avoid stallingof either plane by the other plane. Also, host 100 is free toselectively use the data plane without engaging control planecomponents. Control plane processing in subsystem 200 is carried out ina Packet header processing (PHP) subsystem 410, 460 each as in FIG. 13and equipped with PDSP (RISC CPU) and associated CDE engine to parsepacket headers and define routing for the data plane. PHP PDSP therebysets up any desired logical topology as illustrated in the FIGS. 4-5examples and frees up Host 100. In some embodiments, the PHP PDSPprogram accesses and executes an adapted version of software that wouldotherwise burden the Host, so that PHP 410 or 460 controls the hardwaremodules 310, 320, 370 instead, based on the packet headers and based onthe security context (e.g., FIGS. 6, 7, TABLE 19) and Ingress data(TABLE 31 and FIG. 9).

Firmware executed on PHP PDSP extracts and inspects security headers asper the security protocol stack (IPSEC/SRTP/3GPP etc) in use to definethe action to be carried out on the packet. If the packet passes theheader integrity check, then packet header processor PHP subsystem (FIG.13) sets the route for payload processing within subsystem 200. To setthe route for payload processing, PHP adds a Command label CmdLbl in apre-defined format (e.g. TABLES 4-6) in a data buffer holding a packetor chunk as in FIG. 9. Command label CmdLbl is used by an applicableother hardware module (e.g. Encryption, Authentication, Air Cipher) toforward the packet to the appropriate hardware engine in such module310, 320 or 370. For instance, the packet can be sent to one of AES,DES, or Galois in Encryption module 310; and/or one of the SHA cores orMD5 in Authentication module 320; and/or one of AES, Kasumi or Snow3Gcore in Air Cipher module 370. The native processing to which eachselected scheduled core is adapted then executes. The results are fedinto, between and from modules according to the logical topology ortopologies set up by PHP 410 or 460 or both.

In FIG. 1, Data plane processing is carried out by various dataprocessing subsystems, or modules that are partitioned based on natureof processing done by such subsystem or module. Subsystem 200 has threemajor data processing subsystems, namely 1) Encryption module 310, 2)Authentication module 320 and 3) Air cipher module 370. Packets orchunks thereof are forwarded to the applicable individual data planemodule by decoding the command label prefixed in front of the packetchunk (FIG. 9). The command label is attached by control plane, e.g. PHP410 or 460. Host 100 also can leverage CP_ACE 200 flexibility byselectively engaging any data plane components by prefixing a Commandlabel in or from the packet thereby bypassing PDSP based processing ofPHP.

The Encryption module 310 of FIGS. 1 and 10 supports confidentiality bycarrying out the task of encrypting/decrypting a payload from desiredoffset using hardware encryption cryptographic cores. In FIG. 9, suchoffset is represented by the expression Bfr_Ptr+64+ctl_length+block dataoffset. Buffer pointer Bfr_Ptr points to the chunk, and the just-givenoffset expression points to portion of packet data payload in the chunk.Encryption subsystem 310 has an MCE (mode control engine, FIGS. 1, 11),an AES core, 3DES core and Galois multiplier core which are deployed byMCE. Mode control engine MCE in the encryption module 310 implementsvarious confidentiality modes like ECB, CBC, CTR, OFB, GCM etc, see“Soft Operational Modes” block representing MCE operation in FIG. 10.

The Authentication module 320 of FIGS. 1 and 12 provides integrityprotection. Authentication module 320 is equipped with SHA1 core, MD5core, SHA2-224 core and SHA2-256 core to support keyed (HMAC) andnon-keyed hash calculations electronically.

The Air cipher module 370 of FIGS. 1 and 14 secures data sent to awireless device (such as modem 1100 in FIG. 20) over the air by usingwireless-infrastructure-defined cryptographic cores like Kasumi orSnow3G. This module 370 is also used to decrypt the data as receivedfrom air interface modules.

Further in FIG. 1, the control and data plane processing engines 410,460, 310, 320, 370 each have lines to context RAM 570 to access orstore/update the control information pertaining to each logicalconnection. Context RAM 570 holds the information like Keys, IV, partialdata, etc., for each active security context (e.g., as in FIG. 2, 3, 6,or 7). Cryptographic engine CP_ACE provides and can store up to e.g., 64or more context-identifying numbers on-chip based on the desiredperformance. Context RAM 570 is coupled with Security Context Cachemodule 510 (FIG. 8) to fetch the context information from externalmemory 120 to populate the active context on a real-time demand basis.

In FIG. 1, subsystem 200 accepts packets on respective 32-bit PA andCDMA Streaming buses PA_Str and CDMA_Str respectively feeding a PA(packet accelerator) Ingress CPPI Streaming Interface port 210 and aCDMA Ingress CPPI Streaming Interface port 220 as part of ingress flow.Each packet destined to subsystem 200 is prefixed with at least 8-bytesof CPPI Software Word (for FIG. 9) that holds information about securitycontext to uniquely identify security connection and associated securityparameters. See TABLE 31. Coherency is maintained by CPPI DMA. Wordorder of operations is in-order so that each new packet starts after alast (previous) packet is completely fetched by CP_ACE. Egress ishandled by a PA Egress CPPI streaming interface 270 and a CDMA EgressCPPI streaming interface 280 on other side or output side of the CryptoData and Scheduler SCR 260 that has numerous 64-bit registers.

Regarding the input side of Crypto Data and Scheduler SCR 260, noticethat nine FIFO (first in first out) buffers 250.i or queues support: A)the Security Context Cache module 510 with a pair of such buffers 250.5,250.15 for important parallelism and control bandwidth, and B) onebuffer for each of the two Ingress CPPI Streaming Interfaces for PA andCDMA, C) one buffer each (250.1, 250.11) for IPSEC PHP and Air CipherPHP, and D) one buffer each (250.3, 250.4, 250.7) for the hardwaremodules or engines (e.g. Encryption 310, Authentication 320, Air Cipher370) and buffers 250.2, 250.6 for the IPSEC PHP 410 and Air Cipher PHP460 respectively.

Crypto Data and Scheduler SCR 260 has an associated Packet RAM 265 andan associated Block Manager Module 380. Crypto Data and Scheduler SCR260 has respective outputs coupled to IPSEC PHP 410 and Air Cipher PHP460, and to the Encryption, Authentication, and Air Cipher hardwaremodules 310, 320, 370, as well as outputs to the PA Egress CPPIstreaming interface 270 and the CDMA Egress CPPI streaming interface280, and an output line (when included) directly external to CP_ACE.

Security Context Cache module 510 has inputs for context Ctx Fetch by a128-bit VBUSP bus, and two 64-bit wide lines 262, 263 from Crypto Dataand Scheduler SCR 260. Security Context Cache module 510 has a contextdata line coupled to Context RAM 570, as do each of IPSEC PHP 410 andAir Cipher PHP 460, and the Encryption, Authentication, and Air Cipherhardware modules 310, 320, 370. Context RAM SCR 570 in turn is coupledto three banks of Context RAM 575.

A Configuration SCR 350 store receives 32-bits input from aConfiguration VBUSP bus. Configuration SCR 350 supplies or is accessedfor Configuration data for each of IPSEC PHP 410 and Air Cipher PHP 460,as well as providing Configuration data for each of RNG, PKA, MMRregisters and two banks of Configuration RAM.

Packets are fetched to subsystem 200 via CPPI CDMA using, e.g., twoingress channels and sent out of CP_ACE via, e.g., 16 egress channels(threads). Crypto Data and Scheduler SCR 260 internally breaks up areceived packet on-the-fly from either Ingress port (PA 210 or CDMA 220)into data chunks. Each data chunk can hold maximum of e.g. 256-bytes ofpacket payload. Six banks of packet RAM 265 support Crypto Data andScheduler SCR 260. This chunking operation is provided to fully engagethe hardware engines in modules 310, 320, 370 and to reduce internalbuffer (RAM) spaces 250.i. Chunking also promotes efficient, low-latencycut-through mode operations in subsystem 200 wherein the packet data canthereby be processed and is processed as and when received withoutwaiting for a given whole packet to be completely received and stored.

The initial route in Ingress flow within subsystem 200 is determined byan Engine ID that is extracted from the CPPI software word SW in FIG. 9and described hereinbelow, see also TABLES 3 and 5. Subsequent sequenceprocessing of the data chunk is determined by the command label prefixedto the chunk (FIG. 9, TABLES 4-6) by Host 100 or PHP (packet headerprocessor) module 410 or 460 of FIG. 13. The command label (TABLE 4)holds the engine select codes of TABLE 5 with optional parameters.Multiple command labels can be cascaded (TABLE 6) to allow a chunk to berouted to multiple engines within subsystem 200 to form a logicalprocessing chain. Optional parameters of a command label provide controlinformation pertaining to each processing engine.

CP_ACE allows processing of interleaved data chunks, but always ensuresthat chunks of a given packet follow the same route within the systemthereby maintaining packet data coherency. Chunks are routed to nextengine based on command label, and a chunk can be routed back to a sameengine for second stage processing. Once chunks are processed they arequeued for Egress to exit subsystem 200. Subsystem 200 has two physicalegress ports 270, 280 (PA and CDMA). Internal hardware structure ensuresthat packets entering PA Ingress port 210 can only exit PA Egress port270; likewise packets entering CDMA Ingress port 220 can only exit CDMAEgress port 280. As packets internal to subsystem 200 are processed inchunks, chunks belonging to different packets may cross each other intime, i.e. a data chunk of a last received packet may come out first onEgress before a first packet data chunk. Hence, CP_ACE has 16 EgressCPPI DMA channels, and internal hardware ensures that all data chunksbelonging to an individual packet go out on a same Egress CPPI DMAchannel (thread). The internal hardware maintains packet data coherencyon a given CPPI DMA channel.

Subsystem 200 also hosts TRNG (True Random Number Generator) and PKA(Public Key Accelerator) modules that can be accessed via Memory mappedregisters by IPSEC PHP 410 PDSP, Air Cipher PHP 460 PDSP, or by Host 100to aid key generation and computation.

CPPI software words SW are formed and attached to a packet (e.g., chunkin FIG. 9) by a packet queuing entity. SW Word0 and SW Word1 of CPPIhold the information to associate the current packet to a securitycontext. SW Word2 is optionally used to specify destination CPPI queue.

In TABLE 3, a single bit is sufficient for Present info and each flag,otherwise multiple bits are provided.

TABLE 3 CPPI SW Word0 Field Width CPPI Destination Info Present CommandLabel Present Command Label Offset Multiple bits Engine ID Multiple bitsEvict, Tear, NoPayload Flags Security Context ID (SCID) Multiple bits

In TABLE 3, the CPPI Destination Info Present flag indicates that SWword2 is holding CPPI destination queue information thereby detailingthe flow index on ingress and free queue number or thread to be used onegress when sending this packet out to CPPI after processing. (Comparealso with TABLE 21 and with TABLES 25-28 _thread_id and _req_thread_idcontrols for CPPI I/Fs, and see TABLE 31 Word 2 Flow index description.)The Engine ID field selects the first processing accelerator enginewithin the subsystem 200. The Engine ID field is used, for instance, ifhost 100 is about to send data directly to one or more data planeprocessing engines (Encryption 310, Authentication 320, Air Cipher 370,or cores in any of them) without involving a control plane engine IPSECPHP 410 or Air Cipher PHP 460. Host 100 may be programmed to insert adefault engine ID code PA_ENG_ID or CDMA_ENG_ID that directs thehardware to select the first processing engine from the programmedmemory-mapped register MMR (FIG. 1, TABLE 21) defined for that ingressinterface. The Command label info field has the Command Label Presentflag and multi-bit Command Label Offset. The most significant bit (MSB)of the command label info is the Command Label Present flag, indicatingthat command label has been formed by Host. The Command Label Offset(Cmd Label Offset, PS info) is defined from the start of the CPPIPre-data Control words section (see FIG. 9, TABLES 3, 7) where anengine-specific command label (if any) has been formed. (CPPI Pre-dataControl words section is called Control section for short, elsewhereherein.) Host 100 uses such command label when directly engaging thedata processing engines without involving control plane engine 410 or460. Command Label Offset is address aligned on and specified in 8-bytesunits.

Evict, Teardown and No-Payload flags in TABLE 3 are used to override thedefault behavior of the context cache module 510 (FIG. 8).

In TABLE 3, Security Context ID (SCID) has MSB bit as its First Tier bitand the remaining bits as a security index (SCIDX). MSB bit (First Tier)being set indicates that this is a First Tier connection. Context cachemodule (FIG. 8) uses the multi-bit security Index (SCIDX) to search aninternal table for a locally cached security context. If the search issuccessful, then the locally cached security context is used to processthe packet, else a DMA fetch request is issued from a 32-bits securitycontext pointer SCPTR in CPPI SW word 1 to internal cache memory topopulate the security context. 32-bit security context pointer SCPTR inCPPI SW word 1 is a 64-bytes-aligned physical external memory addressthat is used to fetch a particular security context (e.g., as in FIG. 6or 7) from external memory 120. (SCPTR also is in SCCTL of FIG. 6 and inTABLE 10.)

Optional CPPI SW word 2 has three fields utilized when host 100 isdirectly engaging data processing engines with no PHP involved. EgressCPPI Destination Queue number has multiple-bits to select the Egressdestination CPPI Queue to be used after subsystem 200 processing andtherefore the Host supplies this parameter to select CPPI destinationqueue. Egress CPPI Flow Index field holds a CPPI flow index for EgressCPPI transfers. Egress CPPI Status length field provides CPPI streamingstatus data, such as for the Authentication engine 320 (FIG. 12). Thisfield specifies a number of 4-bytes aligned bytes to send as CPPIstreaming status that appears in CPPI PS section at Host 100.

TABLE 4 shows a Command label format or structure for PHP PDSP or Hostto issue to the data plane processing engines (Encryption 310,Authentication 320, Air Cipher 370 module in FIG. 1). The command labelstructure is PDSP friendly, so that each PHP can rapidly populate thefields in the command label structure. In FIG. 9, the first data block(chunk) of a packet is prefixed with a Command label that holds theinformation about the processing to be carried out on the payload bydata plane processing engines 310, 320, or 370 and specified cryptocores therein. Non-first data blocks (chunks) of the packet can alsooptionally contain a Command label to pass in-line instructions to theselected data plane processing module 310, 320, or 370. The Commandlabel contains a Next Processing Engine select code followed by theoptional control information meant for selected data plane processingengine or crypto core. A Command label can be attached (prefixed) by thepacket header processing PHP module or by Host 100 thereby setting thesequence of processing (logical topology, e.g. of FIG. 4 or 5) on headerand payload within CP_ACE. Host prefixes the Command label when host 100is to engage data plane processing components without involving controlplane components within subsystem 200. In the TABLE 4 Command label, theNext engine select code is followed by length fields, offset fields,option encoding and option bytes. Up to e.g. three options can bespecified in the option bytes field of the Command label. Each optionends at 8-bytes boundary. Padding of zeroes is added to align to aboundary of 8 bytes when padding is needed to do so. A first data block(FIG. 9 Packet data section in the chunk) follows the Command label.

TABLE 4 COMMAND LABEL FORMAT Next Engine ID Command SOP select labelLength to be processed bypass code length (16-bits) length Optionscontrol info (24-bits) Option A Option A Option A Option A Option AOption A Option A Option A MSB byte 1 byte 2 byte 3 byte 4 byte 5 byte 6byte 7 byte (8-bits) byte 0 Option A Option A Option A Option A Option AOption A Option A Option A byte 8 byte 9 byte 10 byte 11 byte 12 byte 13byte 14 LSB byte byte 15 (8-bits) Option B Option B Option B Option BOption B Option B Padding MSB Byte 1 byte 2 byte 3 byte 4 LSB byte 0byte 5

TABLE 5 describes the bits of a Next Engine ID, used to decode the nextprocessing. In some embodiments, Next Engine ID bit fields aresubstituted for any one, some or all of these Next Engine ID bits. Eachactivated bit is decoded to activate the corresponding engine that issignified. The decoder is responsive to activation of multiple bits toactivate the corresponding engines.

TABLE 5 NEXT ENGINE ID BITS ENGINE ID BIT ENGINE DESCRIPTION DefaultIngress Host inserts default engine ID select code, Engine ID in thisscenario the hardware picks up first processing engine from theprogrammed MMR memory-mapped register defined for that ingressinterface. Encryption Module Engine to carry out Encryption/decryption.Pass 1 This engine has AES, DES, Galois core along with mode controlengine MCE. Encryption Module Pass 2 for Encryption/decryption engine inPass 2 CCM mode wherein two levels of encryption processing areexecuted. Authentication Engine to carry out Hashing operation hasModule Pass 1 SHA1, MD5 and SHA2 cores. Authentication Code for Pass2Authentication in case payload Module Pass 2 is routed again toAuthentication module. IPSEC Header Engine to carry out IPSEC headerpacket processor Pass 1 processing holds PDSP that carries out IPSECprotocol-specific header operation. In Pass 1 the packet header isparsed and inspected. IPSEC Header Pass 2 for IPSEC header packetprocessing processor Pass 2 updates and acknowledges the result frompayload processing module. Output Port 1 This is used to send data outof subsystem Egress module 1. 200. Air Cipher Module Engine for aircipher processing. Pass1 has, Pass 1 e.g. AES, Kasumi and Snow3G cores.Air Cipher Module Pass2 for air cipher module, e.g. in Pass 2 GCM/CCMmode SRTP/Air cipher Engine to carry out SRTP/Air Cipher packet Headerprocessor, header processing. The engine holds PDSP to Pass 1 carry outSRTP/Air cipher protocol-specific header operation. SRTP/Air cipher Pass2 for SRTP/Air cipher header packet Header processor processing. Pass 2updates and acknowledges Pass 2 result from payload processing module.Output Port 2 Egress module 2 is used to send data from subsystem 200.

In TABLE 4, a multi-bit Engine Header Length field within a Commandlabel indicates the engine-specific Command label length. This lengthmainly indicates total number of option bytes present plus (or beyond)the 8-bytes of command label. A longer multi-bit Length To Be Processedfield allows the hardware engines to bypass data towards end of datablock and indicates the total number of bytes to be processed afterbypassing SOP Bypass Length for a current packet. Value of all one'simplies that all valid bytes within the current packet are processedthrough end-of-packet EOP from given bypass length. A value of allzero's directs a skip over the current packet so it is skipped fromprocessing. This length is valid in the SOP chunk.

SOP Bypass Length indicates numbers of bytes to be ignored frombeginning of packet before processing the data. All data before SOPbypass length is bypassed. This length is specified in bytes. Thisfeature allows hardware engine to bypass/ignore that data at start ofpacket.

Further in the Command label of TABLE 4, Options Control Info specifiesthe length and Context RAM offset of data that is carried in optionbytes. Options Control Info is decoded by selected processing engine toextract the data from option bytes and populate context RAM 570, 575.Multiple different options can be specified in single command label topass control/messaging information to selected processing engine.Options Control Info has the following multi-bit fields:

Option-A Length specifies the length in units of 8-bytes of option-Abytes present in an Option Bytes area of a Command label. Value of 0implies option-A is not present. Value of all 1's implies an Option-ALength of 64-bytes.

Option-A Context Offset specifies the offset in units of 8-bytes fromstart of engine-specific security context section (e.g., Encryptionmodule-specific section or other module-specific section in FIG. 6 or 7)where the Option Bytes area of a Command label is written.Option-B/Option-C Length and Option-B/Option-C Context Offset haveanalogous meanings as noted for Option-A. Option-A is packed first, thenOption-B and then Option-C and then additional options, if any.

Option Bytes holds the data as specified in the engine option bytesencoding, and used to pass in-band control or message information fromcontrol plane processing components to data plane components on aper-packet or per-chunk basis. (In-band or in-line refers tocontrol/message signaling sent with or accompanying the data to beprocessed.) Each option ends at an 8-bytes boundary, and zeroes arepadded to align the data if the actual bytes are misaligned. Optionbytes are extracted and populated into a security context before apacket is processed so that the specified option bytes are madeeffective for the current data packet.

Notice that this embodiment in effect uses the bytes after the NextEngine ID not only promotes packet processing efficiency but alsocommunicates metadata or access data to control data extraction andwriting of respective option data from the Command label into thecorresponding engine-specific area of a security context, such as inFIG. 6 or 7. In this way, a type of sandwiched or interlocked processembodiment partially constructs or contributes to a security context forFIG. 6 or 7 directly, and also constructs the packet (or chunk)information of FIG. 9 that includes the Command label and the SoftwareWord SW, and then further contributes to and completes the securitycontext for FIG. 6 or 7 using the Command label and the Software Word SWof FIG. 9. Thanks to the interlocked process, the CPPI Pre-data Controlwords prefixed to the packet or chunk itself are remarkably used tocontribute to the security context to which the software Word associatesthe packet or chunk, and thereby also enhance overall system securityand resistance to attack. Moreover, neither the process contributionthat partially constructs the security context nor the contribution fromthe CPPI Pre-data Control words that completes the security context issufficient in itself to provide a security context with which successfulcryptographic processing can occur. Furthermore, the particularinstruction contents and instruction sequences executed by MCE provideeven a third level of security and flexibility.

Some other embodiment might provide core ID (e.g. AES, DES, Galois, etc)and crypto mode parameters as what might be called option data for aparticular engine ID. The security context for Authentication block 320is populated somewhat that way, see description of FIG. 12 and TABLE 15.By contrast, this embodiment primarily or instead uses MCE softwareinstructions based on a remarkable instruction set described laterhereinbelow to flexibly handle such matters of core ID and establishingCrypto mode in modules 310 and 370, see e.g. TABLES 13, 14 and 32.Authentication block 320 lacks MCE and MCE instructions, although it canbe called by an MCE, and the security context for Authentication 320 iscompleted in a somewhat different way than for the Encryption 310 andAir Cipher 370. Therefore, subsystem 200 may be characterized as a mixedembodiment or as actually including two embodiments for security contextformation. Moreover, in a logical topology in which the Authenticationis cascaded with encryption or decryption, system security is stillfurther enhanced by the distinct additional step in the security contextformation to support authentication. Put another way, the architecturaldiversity in the subsystem 200 embodiment contributes to security andflexibility.

Description at this point returns to the examples of command labelsthemselves.

In TABLE 6, multiple command labels are cascaded to allow a packetpayload to be routed to multiple data plane processing engines within asubsystem to form a logical processing chain (a multi-turn coiledlogical topology, cf. FIGS. 1, 4 and 5). As noted in connection withTABLE 4, a first data block (Packet data section protected by Frontpacket Grow region) follows in FIG. 9 after the Command label of TABLE6.

Comparing the particular examples represented by TABLES 4 and 6, notethat TABLE 4 shows a 16-byte Option A and a 6-byte Option B. TABLE 6shows an 8-byte Option A, a 14-byte Option B (end-padded), and then a16-bit Option C. In both Tables 4 and 6, the column headings “Nextengine select code | . . . | Options Control Info” are not included inthe electronic form of the command labels. Many particular examples ofcommand labels and cascaded command labels may be established withoutaltering a given hardware implementation of subsystem 200.

TABLE 6 CASCADED MULTIPLE COMMAND LABELS Next engine Command SOP selectLabel Bypass code Length Length to be Processed Length Options ControlInfo Option A Option A Option A Option A Option A Option A Option AOption A byte 0 byte 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7 MSBbyte LSB byte Option B Option B Option B Option B Option 2 Option 2Option 2 Option 2 byte 0 byte 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte7 MSB byte Option B Option B Option B Option B Option Option B Paddingbyte 8 byte 9 byte 10 byte 11 byte 12 byte 13 LSB byte Option C Option COption C Option C Option C Option C Option C Option C byte 0 byte 1 byte2 byte 3 byte 4 byte 5 byte 6 byte 7 MSB byte Option C Option C Option COption C Option C Option C Option C Option C byte 8 byte 9 byte 10 byte11 byte 12 byte 13 byte 14 byte 15 LSB byte

In TABLE 7, a Scheduler Control Word is used to hand over each datablock that is being transferred from one processing engine to anotherwithin the subsystem 200. This word is used by the hardware engines todecode the length and location of packet and security context along withother control information. This Scheduler Control Word is uniformly usedby the hardware engines to communicate and pass each data block to eachother, so PDSP is presented a reformatted, firmware-friendly view ofthis word. Notice that such passing in an embodiment can occur in thesense of control, with or without actually transferring a data blockbetween different storage spaces within the subsystem 200.

TABLE 7 SCHEDULER CONTROL WORD FIELD NAME DESCRIPTION Block Data Numberof actual valid bytes present in Packet Data Length section of FIG. 9buffer. CTL_Length Number of actual valid bytes present in CPPI Pre-DataControl Words section of buffer. PS Length This field indicates theactual valid bytes present in Trailer Info (PS word) section of buffer.Block Data Number of offset valid bytes present in Front Offset PacketGrow Region of FIG. 9 buffer, used if non-first chunk data increases dueto previously captured partial bytes. Ingress Ingress source port, 0 =PA port, 1 = CDMA port Port Single Single chunk packet flag. ChunkPacket Drop Drop packet bit indicates to drop current packet at PacketEgress, hence no processing. All data processing engines record thisdrop bit to bypass all chunks belonging to this packet. This bit is onlyset by Firmware and is not altered by any data processing engine. RamidxRam index is multi-bit context RAM 575 address value or index used touniquely indentify and associate security context with packet/chunks inpacket RAM 265. This value is established validly for every packetchunk. Error Code Error code is used to pass error condition from dataprocessing engine to Firmware. TABLE 8 details error codes reportable byvarious data processing engines. All data processing engines bypasschunks of current packet if error code is non-zero. Buffer ID Buffer IDis used to locate internal data buffer for data processing engine to useto read chunk data such as from packet RAM 265. This address or indexvalue is established validly for every packet chunk. SOP Start ofPacket, bit when set indicates that current chunk is first chunk ofgiven packet. Actuate parsing of buffer in FIG. 9 for more extensivePre-data information. EOP End of Packet, bit when set indicates thatcurrent chunk is last chunk of given packet. Egress These 4 bits areused to pass CPPI Error Code from Status Firmware. Error Codes (TABLE 8)can be changed even Flags at last chunk of packet. Egress module reportslast-reported Egress status flag as CPPI Error Code with EOP. Cmd LabelThese 4 bits indicate the position in units of Offset 8-bytes of Commandlabel within Control CTL section of buffer (FIG. 9). Command This bitinforms data processing engine if Command Label Label is present or not.If absent, data processing Present engine uses info from securitycontext to process and forward current chunk. Engine ID Current engineID used to route data chunk within subsystem 200.TABLE 8 describes Error Codes.

TABLE 8 ERROR CODES GENERATION Error Code Description ERR_CTX_SOPContext cache lookup failed for non-SOP lookup request, e.g., SOP chunkwas marked as bad. In normal operation the non-SOP lookup does not failas CP_ACE module 510 ensures that context is not evicted until alloutstanding chunks are processed. ERR_DMA_OWNERSHIP Owner bit set toHost while fetching security context from host memory 120. Host 100ensures that owner bit is set to “CP_ACE” before queueing any packets.ERR_CTX_IDRECYCLE Host ensures security context ID is properly recycledand no outstanding packets for recycled context ID remain. There erroris generated if packets lookup request appear after context has beenmarked as “to be torn down” and CP_ACE has not yet completed theteardown operation. See also Tear Down process in FIG. 17.ERR_CTX_AUTOFETCH If context cache module 510, 570 is operated inAuto-fetch disabled mode, then host 100 ensures that security context iscached before packets arrive for that particular context. This error isgenerated if Auto-fetch is disabled and no locally cached securitycontext is found. ERR_ENCR_NOCMDLBL Encryption module 310 received SOPdata chunk with no command label at least for first data chunk.ERR_AUTH_NOCMDLBL Authentication module 320 received SOP data chunk withno command label at least for first data chunk. ERR_AIRC_NOCMDLBL AirCipher module 370 received SOP data chunk with no command label at leastfor first data chunk.

Description now details the Block Manager module 380 of FIG. 1. BlockManager module 380 allocates or frees internal buffer (blocks for use asin FIG. 9) and Thread-IDs (for use as in TABLES 25-28). Within thesystem each respective CPPI Ingress module 210, 220 requests BlockManager 380 for allocation of buffer blocks, e.g. in Packet RAM 265, topack an incoming packet data stream for chunking. The corresponding CPPIEgress module 270 or 280 signals Block Manager 380 to return each usedbuffer block back to a free pool. Similarly, each CPPI Ingress module210, 220 requests Block Manager 380 for a thread-ID if it encounters apacket having a size that is greater than e.g. 252-bytes, and eachcorresponding CPPI Egress module 270 or 280 subsequently signals BlockManager 380 to free-up the allocated thread-ID when such packet is fullyprocessed. Block Manager module 380 has one slave VBUSP bus interfacefor such allocate requests and free-up signaling to be made via thisinterface. An allocate request (VBUSP Write) to address 0x0 is deemed bythe Block Manager circuit to be from PA CPPI Ingress port 210, whereasan allocate request (VBUSP Write) to address 0x08 is deemed by the BlockManager 380 circuit to be from the CDMA CPPI Ingress port 220. A Free-uprequest (VBUSP Read) from either PA CPPI Egress port 270 or CDMA CPPIEgress port 280 is made to address 0x0. Block Manager module 380maintains two independent pools or storage spaces, one for PA packetflow and other for CDMA packet flow, to ensure that a stall in one ofthe flows does not impact the other flow. For instance, if PA Egress 270is back-logged due to descriptor unavailability, this will only impactPA path by exhausting all available free buffers from PA pool of BlockManager 380. But CDMA Egress 280 flow will continue to receive freebuffers from its dedicated pool maintained by Block Manager 380. Thenumber of free buffers in each pool is configurable via FIG. 1 memorymapped registers MMR. Block Manager 380 ensures that at least 4 buffers(1 bank) are allocated to each pool even if MMR configuration is set to0 buffers for the selected pool.

Returning to FIG. 8, Security Context Cache module 510 populates FIG. 1Context RAM SCR 570 based on ingress Security Context ID and type ofcontext, and smart-evicts and fetches security context to/from externalmemory 120 as and when appropriate. Hardware based lookup of cachedsecurity context from context RAM 570, 575 increases speed ofperformance. The Context Cache module 510 supports two tiers of context.First Tier contexts have permanent residence in context RAM 570, 575until affirmatively evicted (TABLE 9) by a processor such as Hostexternal to module 510 and are not auto-evicted by module 520 therein.The module 520 can force eviction and force teardown of a securitycontext by an auto-eviction process on contexts other than First Tier.The processes of populating and evicting of a security context aresupported by and have associated memory management register MMR fields,see e.g. TABLES 23-24. An Ownership bit (TABLE 10) for cache coherencyis checked and updated.

In FIG. 8, Security Context Cache module 510 of FIG. 1 operates toauto-fetch security context from external memory 120 and associates thesecurity context with an ingress packet using SCPTR. This context cachemodule 510 beneficially allows any number of simultaneous securityconnections by not only caching up to a limited number of contextson-chip (in subsystem 200 blocks 570, 575) but also fetching othercontexts as and when requested for processing. Context cache module 510does the task of fetching and associating a security context with eachingress packet. Context cache module 510 populates Context RAM 570, 575with data to/from the external memory 120 based on the security contextparameters. Context cache module 510 carries out auto-evict andauto-fetch operations to allow free space for new connections.

As discussed hereinabove, context cache module 510 allows two tiers ofsecurity connections to facilitate fast retrieval for performancecritical connections. Each security context of the First Tier haspermanent residence within Context RAM 570, 575 for fast retrieval andis not evicted automatically by context cache module 510. Instead, Host100 has the option to force eviction (TABLE 9). First Tier connection isestablished by setting a First Tier bit (TABLE 3, in SCID) while settingup the security context. Second Tier connections are maintained or keptwhile space is available within Context RAM 570, 575. Then if thecontext RAM space becomes full, a new fetch request for a new securitycontext automatically evicts (FIG. 8 module 520) one or more of theSecond Tier connections into external memory 120 to allow free space topopulate the new security context into the context RAM space. Eachaccess request to Context Cache module 510 along with security parameterSCIDX triggers a search in an internal cache table to determine theaction. If lookup 530 fails, then a DMA 520 operation is started topopulate the requested security context into the context RAM space ofthe cache; else if lookup 530 succeeds, the already-cached version ofthe requested security context is used for processing the packet forwhich that security context is requested.

In FIGS. 1 and 8, the Context Cache hardware 510, 570, 575 employs aprocess to manage caching of security context. This hardware implementsa four-way cache where the LSB 4-bits of SCIDX in context-ID (SCID) actas the cache way-select control. Once the cache way has been identified,then four comparisons are performed within the selected cache way tolook for a security ID match. If security ID (SCIDX) matches with any ofthe four stored cache ways, then the security context is recognized aslocally cached. But if lookup/match fails, then security context isfetched by DMA 520 using pointer SCPTR from FIG. 9 CPPI SW word 1, andthe first empty cache way is marked with data from current securitycontext. If lookup finds no empty slot within the selected cache way,then module 520 hardware evicts the last non-active security contextwhich is non-First Tier. In order to avoid deadlock, hardware does notallow marking all four contexts within a given cache way as First Tier.The last First Tier request is ignored if remaining three contexts areFirst Tier. In order to efficiently use the caching mechanism, a linearincremented security context ID is used for new connections. It shouldbe understood that other context cache policies are also feasible invarious embodiments.

Context cache module 510 has or is provided with the security contextpointer SCPTR (see, e.g., FIG. 9 in SW1, FIG. 6 in SCCTL, TABLE 10), andthe security context ID (SCID, TABLE 3), along with control flags andother data with each cache access request by an engine 310, 320, 370,410, or 460. Security context pointer SCPTR is a physical externalmemory 120 address that is used to fetch security context. The format ofthe security context is in FIG. 6 or 7 and the format of the securitycontext control word SCCTL is defined in TABLE 10. SCPTR is a 64-bytesaligned system address, for instance. Security context ID (SCID) has MSBbit as First Tier bit and remaining 15-bits as security index SCIDX, seealso discussion of TABLE 3. Context cache module 510 uses 15-bitssecurity Index (SCIDX) to search an internal table for a locally cachedsecurity context. If search is successful, then the locally cachedsecurity context is used to process the packet associated to it; else aDMA 520 fetch request is issued from or based on the 32-bits securitycontext pointer (SCPTR) to populate the requested security context fromhost memory 120 into internal cache memory 570, 575. Context cachemodule 510 supports passing control flags along with a request to it tooverride its default behavior. Control flags are named Force Evict,Force Tear Down and SOP.

TABLE 9 describes the action taken by context cache module 510 based oncontrol flags Force Evict and Force Tear Down. Host 100 is programmedsuitably to ensure that security context ID is properly recycled and nopackets for a recycled security context ID remain outstanding.

TABLE 9 CONTROL FLAGS FOR ACTIONS BY CONTEXT CACHE MODULE 510 ForceForce Tear Evict Down Action 0 0 Normal operation 0 1 Teardown currentsecurity context after all outstanding packets within CP_ACE system 200pertaining to this particular security context have been processed. Inthis mode context, cache module 510 clears Owner bit in SCCTL header inexternal memory 120 thereby handing security context ownership back toHost 100. Clearing of Owner bit by hardware 520 is indication to Host100 that Teardown operation has been completed. For instance, contextcache module can write 32 bytes and then clear the Owner bit. See alsoFIG. 17 illustrating Teardown. 1 0 Evict current security context toexternal memory 120 after all outstanding packets within CP_ACE system200 pertaining to this particular security context have been processed.In this mode, context cache module 510 looks at Evict PHP Count in SCCTLto determine the numbers of bytes (0, 64, 96 or 128) to be evicted.Clearing of Evict Done bits by hardware 520 is indication to Host 100that Evict operation has been completed. Evict operation will free acurrently-occupied context cache 570, 575 location. See also FIG. 18illustrating eviction process. 1 1 Teardown and Evict current securitycontext after all outstanding packets within CP_ACE system 200pertaining to this particular security context have been processed. Inthis mode, context cache module 510 clears Owner bit and Evict Done bitsin SCCTL header in external memory 120 thereby handing security contextownership back to Host 100. Clearing of Owner bit and Evict Done bit byhardware 520 is indication to Host 100 that Teardown/Evict operation hasbeen completed. In this mode, Context Cache module 510 looks at EvictPHP Count in SCCTL to determine the numbers of bytes (0, 64, 96 or 128)to be evicted. If Evict Count is 0, then context cache module 510 writes32-bytes and then clears the Owner bit. See also both FIGS. 17 and 18.

The security context structure in host memory 120 (DDR3/L2 e.g., 3550,3520.3 in FIG. 20) is fetched by Context Cache module 510 on a demandbasis. Given a particular EMIF architecture for DDR3 memory, the datastructure is arranged to have maximum EMIF efficiency while fetching andupdating security context. In FIG. 1, each processing engine or module(Encryption, Authentication, Air Cipher module and PHP (packet headerprocessing) is coupled to a security context RAM SCR 570, 575 that holdsthe control information to process ingress data blocks. This Context RAM570, 575 is populated by cache control module 510 of FIG. 8 by module510 splitting, or copying and processing and adding module-specificsections to, the host unified data structure on a per-connection basisinto an engine-specific data structure for storage by the context RAM570, 575.

In TABLE 10, a first fetchable section of security context has securitycontext control word (SCCTL, see also FIGS. 6-7 and SW word 1 in FIG. 9)that details the size, ownership and control information pertaining tosecurity context and including an Owner bit, an Evict Done bit-field, ana Fetch/Evict control field. This information is populated by Host 100.Other SCCTL bit fields that can be provided include a SCID filled byhardware, and a SCPTR filled by hardware.

TABLE 10 SECURITY CONTEXT CONTROL WORD SCCTL Owner Context Ownershipbit, 0 = Host, 1 = CP_ACE HW 200. Host 100 hands over ownership toCP_ACE 200 before pushing any packet for given context. After Teardown,CP_ACE 200 relinquishes ownership back to Host 100 by clearing this bit.Host 100 can only set this bit, CP_ACE 200 can only clear the bit.Context cache module 510 monitors this bit during fetch operation. Ifthis bit is zero (0) then the packets are marked as error and forwardedto default queue. Evict All 7-bits are set to zero when evict operationis Done completed. Controllable by either Host 100 or hardware 200.Fetch/ Host controlled. Info byte details sections within Evict securitycontext information to fetch/evict. Bit fields Size in this byte and twobits codes used by each of them: Fetch PHP Bytes (2 bits) Fetch Encr/AirPass1 (2 bits) Fetch Auth bytes or Encr/Air Pass2 (2 bits) Evict PHPbytes (2 bits) 00 = Reserved 01 = 64 bytes 10 = 96 bytes 11 = 128 bytesSCID Security context ID, filled by Hardware. SCPTR Security contextpointer, filled by Hardware.

FIGS. 10 and 12-14 respectively detail processing engines in FIG. 1 forEncryption 310 (FIG. 10), Authentication 320 (FIG. 12), Packet HeaderProcessing PHP 410 or 460 (FIG. 13), and Air Cipher 370 (FIG. 14). Eachprocessing engine has pipeline stages to carry out its module-specifictask(s). Multiple engines can be cascaded by using cascaded CommandLabels as in TABLE 6 to realize protocol-specific end-to-endcryptographic processing, see e.g. FIGS. 4 and 5 logical topologies. Theletter-code legends for lines used in FIGS. 10 and 12-14 are same as forFIG. 1:

p=Packet Data

c=Context Data

f=Configuration Data

(none)=Scheduler Data.

In FIG. 10, Encryption module 310 encrypts or decrypts payload fromdesired offset in FIG. 9 using hardware encryption cryptographic cores.Encryption module has an AES core, 3DES core, and Galois multiplier coreand a Soft Operational Modes block occupied for example by a modecontrol engine MCE of FIG. 11. Mode control engine MCE implementsvarious confidentiality modes like ECB, CBC, CTR, OFB, GCM etc., asenvironment for and employing the AES, 3DES, or Galois multipliercore(s).

In FIG. 1, Context RAM 570 supports processing engines in FIG. 1 forEncryption (FIG. 10), Authentication (FIG. 12), Packet Header ProcessingPHP (FIG. 13), and Air Cipher (FIG. 14). A data structure of TABLE 11 isstored, e.g. by IPSEC PHP 410, in the encryption module-specific sectionin FIG. 6 in context RAM 570 before the information is used byencryption module 310 to process a data block from FIG. 9 packetsforwarded for a particular context ID (SCID). (For analogous Context RAMdata structures adapted for Authentication or Air Cipher, see TABLE 15or TABLE 16.)

TABLE 11 DATA STRUCTURE FOR ENCRYPTION MODULE USE Field Name WriteAccess Description EncryptionModeSel s/w (ctxctrl) 0 = Actual cryptoprocessing, 1 = NULL Default Next Engine-ID s/w (ctxctrl) Bit field toDefault Next engine, used if Cmd Label Absent Error is generated or UseDefault Eng-ID is encountered in Cmd label. EncryptionModeCtrlWord s/w(ctxctrl) Multiple bytes specify encryption mode processing to implementGCM, ECB, CBC, xPON CTR, NIST CTR etc. EncryptionKeyValue s/w (key)Multiple bytes. Key used for cipher operation. This key also loadablein-band via option bytes. EncryptionAux 1 s/w (Aux 1) Stores second keyfor e.g., CCM. EncryptionAux 2 s/w (Aux 2) Used when encryption modeinvolves IV. EncryptionAux 3 s/w (Aux 3) Used when encryption modeinvolves Nonce. EncryptionAux 4 (Aux data 4) Stores intermediate modecontrol data used for next block. Not loaded from host. (The abovefields EncryptionAux1-4 store optional multiple bytes fields forauxiliary data. Each such Aux field can be loaded in-band. Mode controlengine MCE does not alter Aux1 and may alter Aux2-4.) PreCryptoDataStoreh/w Multiple bytes. The data stored in this context is used next timethe context is active to create crypto block size quanta for AES/3DESengine core.

The TABLE 11 Encryption Mode Control Word has a format set out in TABLE12. Write access is by s/w (ctxctrl).

TABLE 12 ENCRYPTION MODE CONTROL WORD FORMAT Field Name DescriptionUpdate Trailer Bit, if set, updates trailer data to In Every Chunk. FIG.9 Trailer section in every FIG. 9 chunk, including SOP chunk. UpdateTrailer Bit, if set, updates trailer data to After Length FIG. 9 Trailersection of buffer only Processed. after last crypto block has beenprocessed. This trailer data is repeated for subsequent chunks of samepacket. Packet Data Bit, if set, updates processed data to SectionUpdate FIG. 9 Packet Data section of buffer. Encrypt/Decrypt Bit (0/1).EncryptionBlkSize 0 = 8 bytes, N = 8 bytes × 2{circumflex over ( )}N.ModeCtrlInstrOffset 12-bits Instruction offset for SOP, MOP and EOP datablock. ModeCtrlInstrs Multiple bytes for Mode Control instructions.

In FIG. 11, a Mode Control Engine (MCE) 610 promotes a higher level ofsecurity and more flexibility to accommodate each engine or modulecircuit 600, e.g. for module 310 or 370 to various differentencryption/decryption modes. Basic encryption processing bycryptographic cores 615.i is complemented with encryption operationalmodes by MCE 610, such as a first MCE 610.1 in module 310 and a secondMCE 610.2 in Air Cipher module 370. Encryption operational modes definean additional level of processing or staging before cryptographic cores615.i are engaged. Encryption operational modes are either specified byNIST publications or are defined by the application specification. Someof the NIST modes are CBC, OFB, ECB and CTR (Counter) whereas a fewpopular application modes are GCM, CCM, F8, CMAC etc. As more and moreencryption operation modes are developed in the industry, there is needto achieve the encryption operational modes via a software controlledprogrammable engine that can be updated to support each new encryptionoperational mode. An embodiment module 600 with MCE 610 and crypto cores615.i answers this need.

This programmable mode control engine MCE embodiment has a programmablemicro-instructed engine to carry out Mode Processing, all as describedherein, and can be updated in the field to support new modes. Some ofimplemented modes are ECB (Electronic code book), CBC (Cipher blockchaining), CFB (Cipher feedback), OFB (Output feedback), CTR (Counter),F8, F9, CBC-MAC (Cipher block chaining-Message authentication code), CCM(Counter with CBC-MAC), GCM (Galois counter mode), GMAC, and AES-CMAC.

The MCE hardware embodiment 600 of FIG. 11 creates an environment aroundnative cryptographic cores 615.i (AES, 3DES, Galois multiplier, etc. inFIG. 10) that allows additional software- or firmware-defined customprocessing before or after crypto processing by the native cores 615.i.MCE 610 also enables storing of parameters for subsequent rounds ofexecution, thereby conferring the ability to process crypto data basedon a previous round (history) rather than based on only current round.Note in FIG. 11 the two-way register access between control plane anddata plane, such as for monitoring and control.

In FIG. 11, this remarkable mode control engine MCE handles modeprocessing via a programmable engine that provides flexibility ofrealizing various types of cryptographic mode processing while at thesame time delivering performance beyond or greatly exceeding that of ageneral purpose programmable processor. Mode control engine MCEprogrammably sequences or schedules various logical, arithmetic andcryptographic operations to achieve, e.g., a specified confidentialitymode and continually keeps one or more cryptographic hardware coresengaged. MCE is fast because it creates an environment around and usesone or more of these fast, native hardware cryptographic (Crypto) cores(AES, 3DES etc). MCE is flexible and economical of chip real estatebecause MCE programmably executes firmware (see, e.g., discussion ofFIG. 22) based on an instruction set (TABLE 13) specifically forcryptographic application, and that permits updates to add customprocessing before or after crypto processing by the Crypto core(s). MCEalso enables storing of parameters for subsequent rounds therebyconferring the ability to process crypto data based on each previousround (history) rather than based on only a current round.

In FIG. 11, the MCE has an MCE core 610 including decode logic andexecute logic that respectively decodes and executes micro-instructionsof TABLE 13, which are devised especially for cryptographic modeprocessing. Sequences of these micro-instructions are loaded beforehandinto the Instruction Array block 605 and are accessed by the decodelogic. The execute logic is supported by an ALU (arithmetic logic unit)and a Register Bank 620 in the MCE core. Bit fields from theinstructions in the instruction array 605 or instruction decoder, andcontrols decoded from an instruction by the instruction decoder, can besuitably transferred directly to any other block in the MCE asappropriate to effectuate any operations that the instructions are codedto represent. Crypto core scheduler logic is provided in the MCE core610 to respond to instructions and to handshake with Crypto cores 615.i.

In FIG. 11, notice the structural parallelism in the MCE hardware tosupport the control plane and data plane structures of MCE. Context dataand configuration data (c, f) are fed by a first MCE input bus fromcontext RAM 570, 575 to a Crypto Context Data input storage block 640that in turn is coupled to the Register Bank 620. Packet data (p) arefed by a second MCE input bus to an Input Data Block 650. A first MCEoutput (c, f) bus emanates from a Crypto Context Data output storageblock 660 that in turn is coupled to and fed from the Register Bank 620.A second MCE output bus emanates from a Processed Data Block 670 andconveys processed data (p) from MCE core or its Crypto cores. A CryptoPadding Logic block 680 is also controlled by the MCE core and Proc_Padinstruction and selectively couples the MCE core to any one, some or allof its Crypto cores, and padding operation is supported whenappropriate. (In FIG. 10 particular crypto cores are coupled to MCE,e.g., as shown in FIG. 11. In FIG. 14, another such MCE as in FIG. 11 iscoupled with AES, Kasumi, and Snow3G cores instead.) A shared data bus630 of MCE is controllably used to couple (or isolate) any two, severalor all six of the Crypto Context Data input block 640, Register Bank620, Input Data Block 650, Crypto Context Data output block 660,Processed Data Block 670, and the Crypto Padding Logic 680. In all theseways the control plane and data plane structures are endowed withcontrollably parallel operations for data transfers respective to eachof them.

The sequences of micro-instructions tune the operations of flexiblehardware of FIGS. 10-11 at run-time to implement a given mode which mayinclude cryptographic algorithmic processing (AES, 3DES etc). Thesemicro-instructions can be altered or added while a device with MCE is inthe field to endow MCE with newly defined modes.

Each instruction is e.g., 12-bits wide, where the first 4-bits are theopcode and remaining 8-bits serve as operands. The instructions executesequentially for every encryption block and the data-out is produced atthe last instruction. Since the start, middle and end of block (SOP,MOP, EOP) in a packet may need a different sequence of operations, ModeControl Engine also allows three different starting points forinstructions execution.

In FIG. 11, MCE parallel processes the mode operations with nativecryptographic core processing. It uses 128-bit registers and 128-bitsarithmetic operations to realize a specified operational mode. MCE alsocan trigger multiple cryptographic engines and cores (e.g., AES, 3DESand Galois multiplier of FIG. 10) on same data block to achieveconfidentiality processing (encryption 310) and source authentication(hashing 320) in a single MCE pass.

An assembler process for MCE is described later hereinbelow using FIGS.21-22.

MCE is a programmable engine that sequences various logical andarithmetic operations to achieve each encryption operational mode withhigh performance. Encryption mode operation is specified byEncryptionModeCtrlWord of TABLES 11-12 that has the format of TABLE 12and is stored within the encryption module-specific section of thesecurity context of FIG. 6. Security context holds the instructions forSoft Mode Control Engine to specify the sequence of logical operation toachieve each desired encryption operational mode.

EncryptionModeCtrlWord, detailed in TABLE 12, is made up of offsetfields ModeCtrlInstrOffset and an actual instructions fieldModeCtrlInstrs. The ModeCtrlInstrOffset offset fields are: SOP offset(4-bits), MOP (Middle) offset (4-bits), EOP offset (4-bit). The actualinstructions field ModeCtrlInstrs holds a Mode Control engine MCEinstruction with a number of bits given by (MaxModeInstr*12) bits, e.g.with MaxModeInstr is set to 16. (This MaxModeInstr can be instantiatedas the size of the Instruction Array hardware, or alternatively in someembodiments be included as a parameter MaxModeInstr inEncryptionModeCtrlWord.) Because the mode processing is different (asdescribed for FIG. 22) for start of packet SOP, middle packet MOP, andend of packet EOP, soft Mode Control Engine MCE allows three differentstarting points for instructions execution. These starting points arespecified in SOP offset, Middle offset and EOP offset, e.g., bit fieldsin ModeCtrlInstrOffset of TABLE 12.

In FIG. 11, the Mode Control engine MCE has four 128-bit registers thatare used as a buffering Register Bank 620 as well as TABLE 13instruction-specifiable processing registers Reg0-3. These registersalso receive the FIG. 11 context “c” information such as TABLE 11“Data-in” (EncryptionModeSel, Default Next Engine-ID,EncryptionModeCtrlWord) via context RAM 570 and Crypto Context Dataregister 640 from PHP 410 or 460 or Host 100 to realize any modefunction. These registers also receive FIG. 11 configuration data “f” ascrypto parameters in TABLE 11 like Key (EncryptionKeyValue),EncryptionAux 1, EncryptionAux 2, EncryptionAux 3. On every new round,the Data-in (e.g., Plaintext) is automatically loaded into registerReg0, and similarly the EncryptionAux 1, Aux 2, Aux 3 are auto-loaded toregisters Reg1, Reg2 and Reg3 respectively. EncryptionAux 4 restores thevalue of register Reg3.

Depending on embodiment or configuration, the Data-in can be auto-loadedas a predetermined number of data bytes (e.g. 16 bytes as in TABLE 32)for processing. This means that in some embodiments fewer than all thepacket data bytes (e.g. 256 bytes in Packet Data section of FIG. 9) areprocessed in each round, so that multiple rounds are used to process achunk in such cases. Also, the embodiment of FIG. 1 can process an e.g.16-byte portion of one chunk while concurrently processing a respectiveother 16-byte portion of that chunk or each of one or more other chunksin other engines or cores in subsystem 200. Various other embodimentsmay process all the Packet data bytes in a chunk in one round or evenprocess all the Packet data bytes in more than one chunk in one round.

The MCE instructions as described using TABLE 13 are carefully devisedkeeping various encryption operational modes in view to balance thearchitectural and computational complexity and performance.

In FIG. 11, Instructions for MCE arrive via an Instruction Array orbuffer and are passed to the instruction decoder in MCE. The followingTABLE 13 teaches and describes the remarkable instructions and theirinstruction format according to which the instruction decoder of modecontrol engine MCE of FIG. 11 is straightforwardly implemented toconvert any instruction to control signals for execution circuits thatthemselves, and/or together with a scheduler for the hardware cryptocores, electronically carry out the operations that each instruction iscoded to represent. Each instruction is 12-bits wide, where the first4-bits are opcode and remaining 8-bits serve as operands. Thisregularity in the instruction width and format of all instructionsallows structuring the instruction store in rectangular form of anInstruction Array in FIG. 11 as well as economical, swift decoding ofinstructions from the Instruction Array by the Decode logic. The firstcolumn in TABLE 13 is opcode, followed by three fields that can be usedto specify source and destination. Certain instructions like WAIT_OUTare special instructions that are geared towards performance and carryout multiple operations in a single cycle.

Among its other remarkable instructions, the MCE has PROC, PROC_MASK andPROC_PAD instructions that orchestrate the hardware crypto cores thatthe MCE programmably controls. PROC, PROC_MASK, and PROC_PADinstructions activate the MCE Crypto Core Scheduler circuit in FIG. 11to cause instruction-designated crypto core(s) to operate and handshakewith the Crypto Core Scheduler Circuit. PROC_PAD also activates theCrypto Padding Logic in FIG. 11. WAIT, OUT and OUTSET are a trio ofinstructions that interrelate MCE operations and crypto core operationsas described in the tabulation and use the handshake with the CryptoCore Scheduler Circuit.

The remarkable PROC_MASK instruction in encryption module's MCE engine(FIG. 11) supports partial bytes in GCM mode, such as for WiMax meshnetworking. A remarkable pad instruction PROC_PAD is provided in the MCEengine to ease, or reduce burden on, Firmware from padding.

Furthermore, a JUMP instruction is remarkably based on packet logicresponsive to: SOP, MOP, EOP, or Not-EOP. JUMP circuitry has a SOPdetector, MOP detector and EOP detector coupled to the packet bufferand/or register associated therewith. The Field0 value for SOP, MOP, EOPor not-EOP in the JUMP instruction is decoded to provide an enable forthe respective SOP detector, MOP detector and EOP detector. The MCE hasa Program Counter (PC) that ordinarily is incremented by MCE clock togenerate addresses to MCE instruction array RAM space, thereby tosequence through the MCE software program. When a JUMP instruction isencountered in the program, the enabled SOP detector, MOP detector orEOP (or Not-EOP) detector provides an output signal active. Thatdetector output signal enables a jam circuit that jams the jump addressin, or pointed to by, the JUMP instruction into the Program Counter (PC)of the MCE to cause a jump by MCE to the jump address. Specifically, inthe tabulated JUMP instruction of TABLE 13, the jump address is formedby an adder that increments the PC by an instruction Offset value infields 2 and 1 of the JUMP instruction. TABLE 12 or 17 can also providea bit field ModeCtrlInstrOffset defining Offset for SOP, MOP and EOPdata block. Some embodiments provide the detectors as comparatorsassociated with a packet parser that finds a SOP, MOP or EOP packetfield. Some embodiments provide a MOP detector as logic that respondsafter SOP has occurred and currently not-SOP and not-EOP for the packet.Another embodiment has a MOP detector as a comparator fed with a packetbyte counter so that that detects when the data stream for the packethas reached a certain programmed byte-count value in a field of TABLE 12or 17 representing a particular position that indicates e.g., MOP asstart-of-payload or some other significant MOP position in the packet oroffset from starting byte of the packet. Logic detects if that bit fieldis non-zero, and if so, uses that bit-field instead of a default valuefor the comparator. In any of these ways, the remarkable MCE with itsspecial JUMP instruction facilitates processing of packets where thedesired operations are specific to, or depend on, the SOP, MOP, and EOPposition or status in a packet. An unconditional (Always) jump code canalso be put in Field0.

The MCE instruction set (ISA) combines with the foregoing a powerful setof ALU instructions for bit-wise XOR, AND, OR, and INC; a shiftinstruction LSFT; two load instructions CP (copy) and LD (load), andno-op NOP. Bit-wise XOR is important, among other things, for providingXOR for crypto operations as well as using XOR to perform a comparison.An instruction is called blocking that pauses MCE core until a givenCrypto core signals Done, and a non-blocking instruction leaves MCE corefree to run during execution by a Crypto core.

In an example TABLE 13, the Mode Control Engine (MCE) has 16 instructionopcodes assigned distinct binary values. See also assembler exampleTABLE 32 with FIGS. 21-22 description later hereinbelow. Each opcode hasmulti-bit fields Field2, Field1, Field0. To avoid repetition of verbiagein TABLE 13 note that, unless otherwise, Field 2 throughout TABLE 13 canindicate a destination (Dst) register Reg0, 1, 2, 3 by a corresponding2-bit representation. Also, unless otherwise, Fields 1, 0 throughoutTABLE 13 can each indicate a particular one of multiple Source 2 (Src2)or Source 1 (Src1) categories each with four registers Reg j=0, 1, 2, 3and with j=4=Key[127:0] or j=5=Key[255:128] by corresponding multi-bitrepresentation. (Numerals like j=0, 1, 2, 3, . . . 7 represent possiblevalues j for an entry to a given Fieldi, a particular such value jelectronically entered with j in binary form.) Depending on theapplicable EngineID (encryption 310, authentication 320, air cipher 370)to which the MCE OUTSET information pertains, references to an Aux inTABLE 13 refer to an EncryptionAux of TABLE 11, an AuthenticationAux ofTABLE 15 or to an AirCipherAux of TABLE 16.

TABLE 13 INSTRUCTION FORMAT FOR MODE CONTROL ENGINE MCE Opcode (4-bits)Field2 (2-bits) Field1 (3-bits) Field0 (3-bits) Description PROC Processinstruction to activate selected crypto core using data from Src1 forcrypto processing. Use Src2 for Core-and-Key Select to select cryptoprocessing core along with Key select, whereas Core-Misc provides datato selected crypto core of a module where MCE resides- -see TABLE 14 (orTABLE 18 for Air Cipher). PROC is a non-blocking command therebyproviding ability to prepare for next round while selected crypto coreexecutes. PROC_MASK Same as PROC except output data of PROC_MASK ismasked based on actual valid bytes present that particular round.Field2: Core-Misc, see TABLE 14 (or TABLE 18 for Air Cipher). Field1:Core-and-key select; TABLE 14; or TABLE 18 for Air Cipher. Field0: Src1:0= Reg0, 1= Reg1, 2= Reg2, 3= Reg3, 4= Key[127:0], 5= Key[255:128].PROC_PAD Applies selected padding to last block of packet based onnumber of valid bytes in last crypto block. Executed with FIG. 11 CryptoPadding Logic 680. Field2: Dst Reg0-3. Field1: Padding sequence. 0 =000....., 1 = 010..., 2 = 1000..., 3 = 1100... Field0: Src1: 0= Reg0, 1=Reg1, 2= Reg2, 3= Reg3. WAIT Blocking instruction until crypto corefinishes the current run, whereupon Src1 is stored to Dst. WAIT Field1entry can also be 6= Data from crypto core. WAIT Field0 entries areeither 6= Data from crypto core, or 7= Data from crypto core XOR'ed withSrc2 (Field1). OUT Outputs all the fields (IV, nonce, data-out) aspre-set by OUTSET instruction thereby completing the current iteration.OUT is last instruction executed for a current run of MCE. WOUT WAIT andOUT are combined for high performance. Also called WAIT_OUT. OUTSET Setssource that goes out as Aux 3, Aux 2 and data-out. Non-blockinginstruction thereby gives ability to prepare the output before Done issensed from crypto core. If WAIT_OUT is next after OUTSET, blocks untilcrypto core issues Done. When Done, all fields are output from cryptocore and current iteration is marked as complete. OUTSET is executed aslast instruction for current run of MCE. Field2, 1, 0 are specified asfollows. Field 2: Aux-3 Select: 0= Reg0, 1= Reg1, 2= Reg2, 3= Reg3. DstReg: 0= Reg0, 1= Reg1, 2= Reg2, 3= Reg3. Field1: Aux-2 Select: 0= Reg0,1= Reg1, 2= Reg2, 3= Reg3. 4= Data from crypto core. 5= WAIT_OUTinstruction Src1 XOR'ed with WAIT_OUT instruction Src2. 6= Data fromcrypto core XOR'ed with WAIT_OUT instruction Src1. 7= Data from cryptocore XOR'ed with WAIT_OUT instruction Src2. 0= Reg0, 1= Reg1, 2= Reg2,3= Reg3, 4= Aux1[127:0], 5= Aux1[255:128], 6= Data from crypto core, 7=Zeroes. Field0: Same way as Field1 above except provides Dataout-selectinstead of Aux-2 Select; and Src2 code 7 instead means Data from cryptocore XOR'ed with Src2. JUMP Jump instruction. Fields 2, 1 form Immediatevalue, instruction offset. Field0 is a Condition code: 0 = Always, 1 =Jump if SOP, 2 = Jump if MOP, 3 = Jump if EOP, 4 = Jump if no EOP. XORBitwise-XOR Src1 with Src2 and store result in Dst. AND Bitwise-AND Src1with Src2 and store result in Dst. OR Bitwise-OR Src1 with Src2 andstore result in Dst. CP Copy Src1 content to Dst. (Src2 not involved.)INC Increment value in Src1 and write to Dst. LD Immediate instead ofSrc2, 1. Load Dst with constant value. LSFT Left shift Src1 based onShift value in Src2. NOP No operation instruction.

TABLE 14 CORE AND KEY TABLE FOR PROC_MASK INSTRUCTION OF MCE INENCRYPTION MODULE Core and Key Select [3 bits] (Field1) Core-Misc [2bits] (Field2) [2:0] = 0 => Null 00 [1:0] = 1 => AES Core 00 = 128 bitskey [2] = 0 => AES Key from Key-in 01 = 192 bits key [2] = 1 => AES Keyfrom Aux 1 10 = 256 bits key [1:0] = 2 => DES/3DES [2] = 0 => DES/3DESKey from Key-in 00 = DES mode [2] = 1 => DES/3DES Key from Aux 1 01 =3DES mode [1:0] = 3 => Galois Multiplier core [2] = 0 => Galois Key fromKey-in 00 [2] = 1 => Galois Key from Aux 1 Note: Aux 1 refers toEncryptionAux 1 of TABLE 11 and in Encryption module-specific section ofSecurity Context of FIG. 6. Regarding Key-in, see TABLE 11EncryptionKeyValue and TABLE 13 Key[:].

In FIG. 12, Authentication module 320 provides data integrity protectionand source authentication to security packets. The authenticationsubsystem hosts SHA1, MD5, SHA2-224 and SHA2-256 hashing hardware coresto compute a digest that is used for data integrity checks.Authentication module 320 also supports keyed hashed computation as perHMAC to provide source authentication used with any of the supportedhardware hashing cores.

For high performance, particularly for small packets, some embodimentsonly support HMAC from pre-computed inner/outer hash. The host 100processor carries out an initial key preparation stage to generate aninner pad and outer pad. Suitable data structure and sequence ofprocessing are provided and implemented.

The data structure is stored beforehand by PHP 410 or 460 or by Host 100in Context RAM 570 for use by the Authentication module 320.Authentication module 320 uses this information to process the FIG. 9data block when packets are forwarded for a particular Security ContextID. TABLE 15 sets forth a data structure example. See also theAuthentication module-specific section in FIG. 6, or applicable AirCipher integrity section in FIG. 7.

In some other embodiments, Authentication module 320 is also providedwith its own processor such as MCE for handling or controlling involvedauthentication operations now and in the future. FIG. 12 economicallylacks such MCE.

TABLE 15 DATA STRUCTURE FOR AUTHENTICATION MODULE 320 Field Name WriteAccess Description AuthenticationModeSel s/w (ctxctrl) Bit: 1 = NULL, 0= Actual Hash processing. Default Next Engine-ID s/w (ctxctrl)Multi-bit. Default Next engine, used if Cmd Label Absent Error isgenerated or Use Default Eng-Id is encountered in Cmd label.AuthenticationSWControl S/w(ctrl) Bit fields: Bit A: Upload hash everychunk. 1 => Upload hash in Trailer section of every data chunk. Initialdata chunks will have partial computed hash. 0 => do not upload Trailersection in every chunk. Bit B: Computed hash upload control. 1 => Uploadcomputed hash to Trailer TLR section only after complete specifiedlength has been processed. Completed hash repeated for all subsequentchunks in same packet. 0 => Do not upload computed hash to Trailer TLRsection of buffer. Bit C: HMAC or basic hash. 0 => HMAC, 1 => basic hashbits. Bits D: Authentication core select field selects core forauthentication operation. 0 => NULL, 1 => MD5, 2 => SHA1, 3 => SHA2-224,4 => SHA2-256. AuthenticationLength S/w(ctrl) Multiple bytes. 1 =Authentication length is overridden for EOP packet or chunk viafirmware. 0 = Let hardware calculate the length based on actual byteshashed. AuthenticationKeyValue s/w (key) Multiple bytes. Master Key orPre-computed inner digest for HMAC Hash(Key XOR Inner Constant). Theinner pad is padded to 256 bits by adding padding bits towards LSB.AuthenticationAux1 s/w (Aux 1) Optional Multiple bytes. Pre-computedouter pad ‘opad’ for HMAC, hash carries over opad, i.e. Hash(Key XOROuter Constant). Outer digest is padded to 256 bits by adding paddingbits towards LSB. AuthenticationAux 2 s/w (Aux 2) Optionalmultiple-bytes field stores partial hash if current block lacks completepacket. This value is restored into authentication core when next blockof same packet is active. PreCryptoDataStore h/w Multiple-bytes data tobe stored in this context that is used the next time the context isactive to create crypto block size quanta for the AES/3DES and/orSHA/MD5 engine.

An Air Cipher PHP 460 structure for the control plane is the same as orsimilar to that of IPSEC PHP 410 of FIG. 13, so FIG. 13 is re-used as adiagram of Air Cipher PHP 460 with analogous description except for Aircipher processing.

In FIG. 14, Air cipher module 370 provides an Air cipher interface thatcarries out the task of encrypting/decrypting FIG. 9 payload consistentwith 3GPP air interface security. The air cipher subsystem 370 does dataplane processing using AES, Kasumi or Snow3G cores. Software-operableMode Control Engine MCE is re-used from or analogous to MCE inencryption subsystem 310 to allow F8, CBC or F9 processing using Kasumi,AES or Snow3G mode.

To support Air Cipher module 370 processing of a FIG. 9 data block, AirCipher PHP 460 (or Host 100) stores a data structure for the applicableinbound or outbound Air Cipher module-specific section of FIG. 7beforehand in context RAM 570 of FIG. 1 before FIG. 9 packets or chunksare forwarded to Air Cipher module 370 for a particular Security ContextID. This data structure to support Air Cipher is detailed in TABLE 16.The reader may compare and contrast TABLE 16 with the separate datastructure TABLE 11 in RAM 570 and FIG. 6 for supporting encryptionmodule 310.

TABLE 16 DATA STRUCTURE FOR AIR CIPHER MODULE 370 Field Name WriteAccess Description AirCipherModeSel s/w (ctxctrl) Bit: 0 = Actual cryptoprocessing, 1 = NULL. Default Next Engine-ID s/w (ctxctrl) Multi-bitDefault Next engine, used if Cmd Label Absent Error is generated or UseDefault Eng-ID is encountered in Cmd label. AirCipherModeCtrlWord s/w(ctxctrl) Multiple bytes specify AirCipher mode processing for modes:GCM, ECB, CBC, xPON CTR, NIST CTR, etc. See TABLE 17. AirCipherKeyValues/w (key) Multiple bytes used for cipher operation. This key can also beloaded in-band via option bytes. AirCipherAux 1 s/w (Aux 1) Optionalmultiple bytes field used to store auxiliary data to support Air Ciphermodes like CCM to store second key. Can be loaded in-band via optionbytes in Cmd label. Mode control engine MCE cannot alter the value ofthis field. AirCipherAux 2 s/w (Aux 2) Optional second Aux multiplebytes field used if AirCipher mode involves IV. This value is alterableby Mode Control Engine MCE and loadable in-band via option bytes.AirCipherAux 3 s/w (Aux 3) Optional third Aux data multiple bytes fieldused if the AirCipher mode involves Nonce. This value is alterable byMode Control Engine MCE and loadable in-band via option bytes.AirCipherAux 4 (Aux data 4) h/w Multiple bytes Aux data 4 to storeintermediate mode control data to be used for next block. This spacecannot be loaded from main host, but can be loaded in-band via optionbytes. PreCryptoDataStore h/w Multiple bytes data to be stored in thiscontext that is used the next time the context is active to createcrypto block size quanta for AES/Kasumi/Snow3G engine.TABLE 17 tabulates the format of the important TABLE 16 word designatedAirCipherModeCtrlWord.

TABLE 17 FORMAT OF AirCipherModeCtrlWord Field Name Description UpdateTrailer Bit, if set, updates trailer data to In Every Chunk. FIG. 9Trailer section in every FIG. 9 chunk, including SOP chunk. UpdateTrailer Bit, if set, updates trailer data to After Length FIG. 9 Trailersection of buffer only Processed. after last crypto block has beenprocessed. This trailer data is repeated for subsequent chunks of samepacket. Packet Data Bit, if set, updates processed data to SectionUpdate FIG. 9 Packet Data section of buffer. Encrypt/Decrypt Bit (0/1).EncryptionBlkSize 0 = 8 bytes, N = 8 bytes × 2{circumflex over ( )}N.ModeCtrlInstrOffset 12-bits Instruction offset for SOP, MOP and EOP datablock. ModeCtrlInstrs Multiple bytes for Mode Control instructions.

In FIG. 14, the Air Cipher data plane module 370 somewhat resembles theEncryption module 310 of FIG. 10. Air Cipher module 370 has an In-Packerand Out-Packer flanking a central execution core Air_core_top. Thisexecution core has a Soft Operational Modes block. For this block, asoft Mode Control Engine MCE like that in FIG. 11 and TABLE 13 isprovided to achieve a high level of security, but wherein Air cipherencryption by AES, Kasumi, or Snow3G hardware cryptographic cores ismostly complemented with Air Cipher operational modes, which theflexible MCE in FIG. 14 readily establishes. The Air Cipher operationalmodes define an additional level of processing or staging before thecryptographic cores are engaged. The flexibility of MCE beneficiallycomplements the speed of the cryptographic cores. Air Cipher operationalmodes that can be specified for module 370 include the NIST modes CBC,OFB, ECB and CTR(Counter), and some other supported application modesare CCM, F8, CMAC etc. (See AirCipherModeCtrlWord in TABLES 16-17.) Asmore and more air cipher operation modes are developed in the industry,mode control engine MCE answers a need to achieve the air cipheroperational modes flexibly via its software controlled programmableengine that can be updated to support new air cipher operational modes.MCE is a programmable engine that sequences various logical andarithmetic operations to achieve air cipher operational modes with highperformance essential to execute such modes flexibly.

Air Cipher mode operation is specified by AirCipherModeCtrlWord (seeTABLES 16, 17 and 12) that is stored in Context RAM 570 as part of thesecurity context that holds the instructions for soft Mode ControlEngine in FIG. 14 and FIG. 11 to specify the sequence of logicaloperation to achieve each desired air cipher operational mode.

Details of Mode Control Engine MCE for Air Cipher module 370 of FIG. 14and its instruction format are the same as in the description of FIG. 11and are the same as in TABLE 13 except that the PROC_MASK instructionfor Air Cipher MCE in FIG. 14 is specified using TABLE 18 Core and Keyselect information to support TABLE 13 description of the instructionset.

TABLE 18 CORE AND KEY SELECT FOR PROC_MASK INSTRUCTION OF MCE IN AIRCIPHER MODULE 370 Core and key select (3-bits) (Field1) Core-Misc (2bits) (Field2) [2:0] = 0 => Null 00 [1:0] = 1 => AES Core 00 = 128 bitskey [2] = 0 => AES Key from Key-in 01 = 192 bits key [2] = 1 => AES Keyfrom Aux 1 10 = 256 bits key [1:0] = 2 => Kasumi Core [2] = 0 => KasumiKey from Key-in 00 for all [2] = 1 => Kasumi Key from Aux 1 [1:0] = 3 =>Snow3G core [2] = 0 => Snow3G Key from Key-in [0] = 1 => Init Key [2] =1 => Snow3G Key from Aux 1 [1] = 1 => Store Snow3G state Note: Aux 1refers to AirCipherAux 1 of TABLE 16 and in Air Cipher module-specificsection of Security Context of FIG. 7.

The FIG. 14 Snow3G core in Air Cipher module 370 saves and restores aninternal state of, e.g., 76-bytes while processing intermediate chunks.Hence, this 76-bytes state value is stored in an Authentication part(EngineID=Authentication Module code-value) of the security context (SeeFIG. 6). Air Cipher 370 using Snow3G core uses the encryption section(engine ID=Encryption Module code-value). As part of key initializationfor Snow3G core, a multi-byte IV (Initialization Vector for keyderivation) is picked or obtained from register Reg1 of MCE registerbank 620. Therefore, MCE instructions ensure that an InitializationVector IV is stored at register Reg1 before issuing a PROC instruction(TABLE 13) that involves key initialization.

Returning to FIG. 13, each Packet header processor (PHP) Module 410 or460 is part of the control plane of FIG. 1 that parses and inspectssecurity headers to establish the sequence of processing to be carriedout on the packet. The Header processing PHP subsystem hosts a PDSP RISCCPU to carry out control plane operations. PDSP Pro in FIG. 13 isconnected to tightly coupled memories to allow faster access to packetdata. Packet header processor PHP module has an instruction RAM that ispopulated by host 100 as part of initialization. This firmware holds thecontrol plane code as per IPSEC, SRTP or 3GPP standards to parse andinspect ingress packet headers.

A Descriptor information word (see FIGS. 2, 3 and 9) provides controlinformation about the current data chunk thereby providing variouslengths and other fields. The format and definition of each field issuitably specified.

In FIG. 13, the PHP module is complemented with security context viewermodule that provides a rolling window view of the security context. Thisallows easy access and update of security context data to PDSP firmwareas the window is directly mapped to PDSP registers.

Following are the commands that can be issued by PDSP to adjust theposition of window and indicate DONE to the security context viewermodule Context Viewer in FIG. 13. A security context viewer commandregister has one byte designated Offset and another byte designatedOperation. The Offset byte specifies an offset (e.g., 0 to 255) fromstart of security context (FIG. 6 or 7) where the window is to bepositioned. The Operation byte specifies a command code signaling thetype of operation to perform: SCV_CMD_POSITION_WINDOW 0x1, andSCV_CMD_DONE 0x2. (SCV refers to the security context viewer.)

Context RAM 570 of FIG. 1 also supports the PHP module of FIG. 13 with adata structure of TABLE 19 pre-stored by Host 100 or Context CacheModule 510 in the Context RAM 570 before packets are forwarded for aparticular Security Context ID. The data structure information is in thePHP module-specific section in FIG. 6 or FIG. 7 and is used to processthe data block using the information in TABLE 19.

TABLE 19 DATA STRUCTURE FOR PHP MODULE 410 or 460 Field Name WriteAccess Description SCCTL s/w (ctxctrl) Multiple bytes. As in contextcache module, SCCTL field contains SCID, SCPTR and other control flags,TABLE 10 FirmwareReadWriteSpace s/w and H/w Multiple bytes. FirmwareRead and write space. This information is used by firmware to maintaindynamic parameters like rolling window markers etc. This section isupdated by hardware when the context is evicted to external memory.

A set of address ranges (each is a pair of numbers [:]) are adopted aspre-specified system constants for the PDSP, as templated in TABLE 20.RXPKT means Receive Packet (Ingress), TXPKT means Transmit Packet(Egress). PHP1 is IPSEC PHP 410, PHP2 is Air Cipher PHP 460 in FIG. 1.

TABLE 20 SYSTEM CONSTANTS FOR ADDRESS RANGES C0 Scratch1_LRAM0 BASE C1Scratch2_LRAM1 BASE C20 TRNG True Random number generator base addressC21 PKA Public key accelerator base address The following constants holdpairs of ranges for PHP1 and PHP2. PHP2 Ditto for each of these: C6 PHP1CDE_Sideband RXPKT C7 PHP1_CDE_Sideband TXPKT C8 PHP1_CDE_SidebandHELDPKT C10 PHP1 Random Number FIFO control Block C11 PHP1 PacketInstance Base Address C12 PHP1 Temporary storage of Aux (ICV) Data C13PHP1 Temporary storage of Command Label Table C14 PHP1 Global StatisticsC15 PHP1 Random Number FIFO base address C16 PHP1 IPSEC ESP Tx CommandLabel Processing Table C17 PHP1 IPSEC ESP Rx Command Label ProcessingTable C18 PHP1 IPSEC AH Tx Command Label Processing Table C19 PHP1 IPSECAH Rx Command Label Processing Table

Returning to FIG. 1, CP_ACE subsystem 200 hosts a Public key acceleratormodule PKA that is accessible via memory mapped registers. The PKAmodule provides a high-performance public key engine to accelerate thelarge vector math processing that is involved in Public Keycomputations.

The public key engine of PKA provides the following basic operations:Large vector add, Large vector subtract, Large vector compare (XOR),Vector shift left or right, Large vector multiply, Large vector divide,and Large vector exponentiation. PKA can execute a Diffie-Hellmanexponentiation operation for high security based on modulus sizes up tolarge numbers of bits and large exponents. A small amount of additionalsoftware processing is executed on the Host 100 processor as well.Operand and result vectors are stored in a multi-Kbytes vector RAM. Thevectors are sequentially cycled through the processing engines of thePKA, with intermediate products from large or complex operationstemporarily stored a RAM as well. The Host configures PKA for theintended operation, providing proper operand data, and allocating spacefor the result vector.

In FIG. 1, a True Random number (TRNG) Module provides anon-deterministic random number generator to assist host with keyderivation operations like IKE etc. This can also be used to createinitialization vector for certain encryption modes. CP_ACE hosts truerandom number generator TRNG, which can accessed via memory mappedregisters MMR.

Some memory mapped registers MMR to configure and control variousfeatures of cryptographic engine CP_ACE of FIG. 1 are describedhereinbelow.

TABLE 21 MEMORY MAPPED REGISTERS CMD_STATUS See TABLE 22. CTXCACH_CTRLSee TABLE 23. CTXCACH_SC_ID See TABLE 24. CTXCACH_SC_PTR Context CacheSecurity Context Pointer Register for MMR based fetch RW 0x0.CTXCACH_MISSCNT Context Cache miss count. BLKMGR_PA_BLKS Number M ofpacket blocks reserved for PA Port in units of 4 blocks to ensure thatPA and CDMA flows do not stall each other. CP_ACE system has N totalblocks. CDMA Port flow gets N/4 − M such units. See also Block Manager380. PA_FLOWID PA Port default CPPI Flow ID used for packet coming fromPA Ingress port. RW 0x0 CDMA_FLOWID CDMA Port default CPPI Flow ID,ditto. PA_ENG_ID PA Port default Next engine ID to select firstprocessing engine within CP_ACE if Default Engine ID select code isdetected in incoming CPPI SW word0 word. RW 0x10 CDMA_ENG_ID Ditto forCDMA Port default Next engine ID

Command Status Register CMD_STATUS from TABLE 21 includes for each ofthe following blocks of TABLE 22 a read-only busy status bit (_BUSY)generated by respective block and an enable _EN bit that is R/Wread/writeable by firmware, the bits forming a bit-pair. All resets areto non-busy, non-enabled statuses.

TABLE 22 COMMAND STATUS REGISTER BLOCK BIT-PAIR _BUSY, _EN PA CPPIIngress port PA CPPI Egress port CDMA CPPI Ingress port CDMA CPPI Egressport Security context cache module PHP1 IPSEC Packet Header Processingmodule PHP2 Air Cipher Packet Header Processing module PKA module* TRNGmodule* Encryption module* Authentication module* Air Cipher hardwaremodule* *E-fused enable _EN. Also, an e-fuse enable is provided toenable the subsystem 200.

The Context Cache Control Register CTXCACH_CTRL from TABLE 21 isdetailed in TABLE 23.

TABLE 23 CONTEXT CACHE CONTROL REGISTER Field Name Description TypeReset BUSY Bit, if set, indicates that R 0x0 context cache engine isbusy. CTX_CNT Current cached security context R 0x0 multi-bit count.CLR_STATS Setting this bit clears context RW 0x0 cache statistics.Auto-cleared. CDMA_PORT_EN Enable CDMA ctxcach port. If RW 0x1 this portis disabled, no look-up nor auto-fetch will happen for security contextfor packets coming on this port. PA_PORT_EN Enable PA ctxcach port. Ifport RW 0x1 is disabled, no look-up nor auto-fetch will happen forsecurity context for packets coming on this port. CLR_CACHE_TABLE Clearinternal cache table. This RW 0x0 bit clears after operation iscompleted. Cache table is auto cleared after reset. AUTO_FETCH_EN EnableAuto fetch for security RW 0x1 context

The Context Cache Security Context Identification Register CTXCACH_SC_IDfrom TABLE 21 is detailed in TABLE 24.

TABLE 24 CONTEXT CACHE SECURITY CONTEXT IDENTIFICATION REGISTER FieldName Description Type** DONE Done bit set indicates operation is Rcompleted. SC_ERRORCODE Return Error code bits. return of R zero meansuccess. SC_RAMIDX Return Ram index byte. R GO Go bit. Setting this bitwill execute RW selected action. SC_TEAR Tear-down selected SCID. RWSC_FETCH_EVICT If set Evicts selected SCID. If reset Fetch selectedSCID. RW SC_ID SCID for MMR based fetch RW BUSY If set, Busy bitindicates that R context cache engine is busy. CTX_CNT Current cachedsecurity context R multi-bit count. CLR_STATS Setting this bit clearscontext cache RW statistics. Auto cleared. CDMA_PORT_EN Enable CDMActxcach port. If this port RW is disabled then no look-up nor auto-fetchwill happen for security context **Types are R: Read; RW: Read/Write.Reset for all fields is to 0x0, except PORT_EN which is reset-enabled to0x1.

Host polls the system of FIG. 1, for example. Other embodiments mayprovide for interrupts to Host. Different embodiments or optionsprovided therein support a specified or configured endian type. Securitycontext is formed as shown in context cache section. Host swaps wordsbased on system width configuration to ensure that memory print ofsecurity context is same in either endian.

CP_ACE is suitably clocked by a main clock (e.g., 350 MHz) and asynchronous divide-by-two off main clock to drive cryptographic coreslike PKA, PKA RAM, and TRNG. Internal clock gating shuts down clock toany of various cryptographic cores in response to Host/PDSP via a memorymapped register MMR based on current operational mode, and provided aDone acknowledgment is received from an affected core. See, e.g., TABLE22 with module-specific enable ‘_EN’=0.

In FIG. 1, the CDMA Ingress CPPI Streaming interface is used to receivepacket data from CPPI DMA (CDMA) for packets coming from Host and hascontrols tabulated in TABLE 25.

TABLE 25 CONTROLS FOR CDMA INGRESS CPPI STREAMING INTERFACE In/OutSignal Pin Name Type Function cp_ace_pktstrm_incdma_thread_sready Out.Indicates that CP_ACE's CDMA Ingress port currently have buffering toaccept a block of data. cp_ace_pktstrm_incdma_thread_id In. Thread ID:Indicates the thread that is currently occupying the streaminginterface. Multi-bit with log2 number of threads.cp_ace_pktstrm_incdma_req In. Request: when asserted indicates that allof the other information on the bus is valid.cp_ace_pktstrm_incdma_data_type In. Data Type indicates the type of datathat is being transferred on the data bus. Multi-bit.cp_ace_pktstrm_incdma_req_thread_id In. Request Thread ID indicates thetarget thread to which data will be transferred on the following clockcycle. cp_ace_pktstrm_incdma_worden In. Word Enable: Indicates which32-bit words on the interface are valid. Primarily used on interfaceswider than 32-bits to allow one or more optional words to beincluded/excluded during the data phase transfer. Not used for thepayload data data phases. cp_ace_pktstrm_incdma_xcnt In. Data Phase ByteCount: Indicates how many payload bytes are transferred during thecurrent data phase. Pertinent for payload data data phases.cp_ace_pktstrm_incdma_data In. Data: The info, control, PS, and payloaddata word. cp_ace_pktstrm_incdma_sop Start of Packet Indicator: Assertedco-incident with the start for the block, to indicate that a new packetis starting. cp_ace_pktstrm_incdma_eop In. End of Packet Indicator:Asserted to indicate the close of a packet. cp_ace_pktstrm_incdma_dropIn. Drop Packet Indicator: Asserted to indicate that the current packetin this thread should be dropped at the destination.cp_ace_pktstrm_incdma_pkt_error[3:0] In. Packet Error Indicator bitindicates if an error occurred during reception of this packet. 0 = Noerror occurred, 1 = Error occurred. Additional information aboutdifferent errors may be encoded in the error flags fields.

In FIG. 1, the PA Ingress CPPI Streaming interface is used to receivepacket data from PA port. TABLE 26 tabulates controls for thisinterface.

TABLE 26 CONTROLS FOR PA INGRESS CPPI STREAMING INTERFACE (Analogous toTABLE 25 for simplicity of architecture. Substitute “pa” for “cdma” inTABLE 25 wherever “cdma” occurs in TABLE 25 to obtain TABLE 26.)cp_ace_pktstrm_inpa_thread_sready Out This signal indicates thatCP_ACE's PA Ingress port currently have buffering to accept a block ofdata. . . . etc.

Controls for CDMA Egress CPPI streaming interface are listed in TABLE27. Notice that for simplicity of architecture, these controlssubstitute “out” for “in” in TABLE 25 wherever “in” occurs in TABLE 25field designators to obtain TABLE 27. Note that the first control entryin TABLE 27 is somewhat differently worded than the first control entryin TABLE 25.

TABLE 27 CONTROLS FOR CDMA EGRESS CPPI STREAMING INTERFACEcp_ace_pktstrm_outcdma_thread_mready Out. Master Thread Ready: Indicateswhich threads currently have valid information waiting to be transferredto the slave. Multi-bit field with number of bits equal to number ofthreads. etc.

TABLE 28 CONTROLS FOR PA EGRESS CPPI STREAMING INTERFACE (Analogous toTABLE 27 for simplicity of architecture. Substitute “pa” for “cdma” inTABLE 27 wherever “cdma” occurs in TABLE 27 field designators to obtainTABLE 28.)

The memory map of the FIG. 1 subsystem is suitably allocated to thevarious storage structures, such as in TABLE 29, so they areaddressable. AIHM means “All internal hardware modules.” AHE means “Allhardware engines.” Respective sizes are suitably adopted for the variousstructures in the design process, and their values are accumulated todetermine address offsets from some base address to establish addressesfor all the memory-mapped structures.

TABLE 29 MEMORY MAPPED STORAGE STRUCTURES Offset Size Region PrimaryAccess MMR/Ctxcach registers Host/PDSP PDSP 0, 1 Control/StatusRegisters Host/Debugger PDSP 0, 1 Debug Registers Debugger PDSP 0, 1Program Memory Host/Debugger PDSP Scratch Memory 0, 1 Host/PDSP/CDE CDE0, 1 Sideband Memory Interface PDSP0, 1 respectively PKA module, VectorRAM starts at offset. Host/PDSP TRNG module Host/PDSP PA Ctxcach moduleLookup Port AIHM PA Ctxcach module EOP Port AIHM CDMA Ctxcach moduleLookup All internal Port Hardware modules CDMA Ctxcach module EOP PortAIHM Block Manager module AIHM Packet RAMs 0-5 AHE Host/PDSP canread-only PA CPPI Egress Port AIHM CDMA CPPI Egress Port AIHM IPSEC PHPscheduler port AIHM Encryption module scheduler port AIHM Authenticationmodule scheduler port AIHM SRTP/Air Cipher PHP scheduler port AIHM Aircipher module port AIHM

In FIG. 1, the subsystem can provide 1.4 Gbits/sec high performance onEthernet traffic while running at 350 MHz for IPSEC and SRTP protocols.This subsystem also can process 400 Mbits/sec of air cipher traffic asdefined by 3GPP in parallel to IPSEC. In order to provide the IPSEC/SRTPperformance, the internal hardware cores like AES, 3DES, SHA1 etc areable to saturate the ingress traffic bit rate while running at 350 MHz.

Projected performance of various cores based on packet size is discussednext. The number of packets to be processed by the subsystem each secondis called the packet rate. The packet rate for 1.4 Gbit/sec is afunction of packet size. For 1.4 Gbits/sec Ethernet traffic, thesubsystem processes 2.08 million 64-bytes packets per second. The numberof packets per second decreases approximately inversely with increasingpacket size.

Performance is also considered for the individual hardware cores in FIG.1 on a most-burdensome case basis in various modes to process the 1.4Gbits/sec of Ethernet traffic. In AES-CCM mode of encryption, forinstance, a same packet payload is run twice for AES processing. Inhashing, SHA1 using HMAC uses an additional hashing round to close thekeyed hash.

TABLE 30 describes the performance of each individual core running at350 MHz. Air cipher cores (Kasumi and Snow3G) are run at half the clockof the CP_ACE clock in this example. Size refers to Block size in bits.Cycles refers to cycles per block. Modes overhead is entitled Modes.Frequency (MHz) is entitled Freq. “Actual” refers to Actual Throughput(Mbits/sec), and “Goal” refers to Throughput Goal (Mbits/sec). Modulesare also called cores.

TABLE 30 PERFORMANCE OF CORES Module Size Cycles Modes Freq Actual GoalRemarks AES core 128 15 1 350 2,800.0 1,365.0 256-bit key nrs case 3DEScore 64 14 1 350 1,493.3 1,365.0 3 key nrs, case Galois 128 8 1 3504,977.8 1,365.0 Galois mult., GCM mode AES-CCM 128 13 1 350 1,600.01,365.0 Run twice for 1 key block 128/192bits AES-CCM-256 bits 128 15 1350 1,400.0 1,365.0 Run twice for 1 key block Kasumi 64 16 2 350 1,244.4400.0 Kasumi in F8 mode same Snow3G** 320 96 2 350 1,142.9 400.0 SeeNote. SHA1 512 81 1 350 2,185.4 1,386.0 SHA 1 core MD5 512 65 1 3502,715.2 1,386.0 MD5 core SHA2 512 65 1 350 2,715.2 1,386.0 SHA 2 coreHMACSHA1 512 81 1 350 2,185.4 2,133.0 SHA 1 core HMAC-MD5 512 65 1 3502,715.2 2,133.0 MD5 core HMACSHA2 512 65 1 350 2,715.2 2,133.0 SHA 2core **Note for Snow3G: 40 bytes in one block (38 cycles for first 4bytes, 2 cycles each for subsequent 4 bytes, 40 cycles forstore/restore), most-burdensome case store/restore each 40 bytes.

In FIGS. 15-19, description now turns to process embodiments forintegration of the CP_ACE into a chip level context of FIG. 20.

In FIG. 15, an Initialization process embodiment has the followingsteps:

INITIALIZATION PROCESS, FIG. 15

-   1. Enable PHP1SS_EN and/or PHP2SS_EN in CMD_STATUS Register. (TABLE    22).-   2. Download Firmware into PDSP's instruction RAM, see I-RAM, FIG.    13.-   3. Enable PDSP by writing into PDSP registers.-   4. Enable support by other hardware engine(s) by writing into    CMD_STATUS Register.-   5. Set up connection by forming CP_ACE specific security context in    RAM 570, 575, using format in FIG. 6 or 7.-   6. Queue packets to be processed by CP_ACE, e.g. by ingress into    Packet RAM 265 and chunking using format in FIG. 9.

In FIG. 16, a security context setup process embodiment has thefollowing steps, wherein Host and CP_ACE handshake to avoid raceconditions.

SETTING UP SECURITY CONTEXT: PROCESS, FIG. 16

-   1. Host forms security context in Host memory at SCPTR address and    allocates SCID.-   2. Host (not SA) relinquishes ownership to CP_ACE by setting Owner    bit in SCCTL to 1. (See TABLE 10.)-   3. Host cannot make any more changes to security context after    CP_ACE has been made owner.-   4. Host queues packets with above SCPTR and SCID whenever packet is    meant for this connection. Alternatively, Host can add security    context via Memory map registers MMR.-   5. CP_ACE gets SCID, SCPTR along with context control flags, per    SCCTL in TABLE 10.-   6. CP_ACE does internal look-up on SCID to check for cached    connection.-   7. Since this is first packet for given connection, internal look-up    fails.-   8. CP_ACE issue DMA to fetch security context using SCPTR.-   9. CP_ACE checks for owner to be CP_ACE (i.e. Owner bit is set to 1    by host).-   10. If owner is not CP_ACE (Owner bit is 0), CP_ACE drops the    security context and mark packet as bad by setting corresponding    error code.-   11. If owner bit is CP_ACE (Owner bit is 1), CP_ACE fetches the    complete security context.

In FIG. 17, a security context tear-down process embodiment has thefollowing steps:

TEAR DOWN PROCESS, FIG. 17

1. Host sends tear-down packet to CP_ACE with No Payload and Tear-downbit set, see TABLE 3 and TABLE 9. Alternatively, Host can set tear-downbit in last packet.2. Host ensures that no new packets are sent to this security contextafter tear-down packet has been sent.3. CP_ACE records that given security context is to be subject totear-down.4. CP_ACE ensures that all packet within CP_ACE buffers are processedbefore tear-down action is executed.5. Finally, CP_ACE clears owner bit (Owner bit, SCCTL, TABLE 10) to givecontrol back to Host. Host is programmed so that, after launching thetear-down packet, host waits for an Ownership bit (Owner bit SCCTL) tobe cleared as indication that the tear-down operation has beencompleted.6. Host ensures that the same SCID is not used until tear-down operationis completed as indicated by clearing of Owner bit.

In FIG. 18, a process embodiment to evict security context has thefollowing steps:

EVICT SECURITY CONTEXT: PROCESS, FIG. 18

1. Host writes all 1's in Evict Done bits in SCCTL, see TABLE 10.2. Host Send packet with Force Evict flag set, alternatively host canset evict information via memory mapped register3. When hardware completes evict operation, it changes Evict Done to all0's.4. Host senses change in state of Evict Done from all 1's to all 0's toknow evict has been completed.

In FIG. 19, a process embodiment to choose Pass1/Pass2 engine ID, seeTABLE 5, for data processing engines has the following steps:

CHOOSE PASS1/PASS2 ENGINE ID: PROCESS, FIG. 19

1. Pass1 and Pass2 can be used in any order if same hardware engine isnot used twice in the flow, for instance AUTH (Pass2)→ENCR (Pass1) andAUTH(Pass1)→ENCR(Pass2) are permissible.2. If same hardware engine is used for both Encryption andAuthentication, then second pass uses Pass2 engine ID. (See TABLE 5.)For instance, if Air Cipher hardware engine is used for bothKasumi-encryption and Kasumi-authentication for inbound flow(AUTH→ENCR), then Kasumi-authentication uses Pass1 code value, andKasumi-encryption uses Pass2 code value.

Further a process embodiment to remove last chunk has the followingsteps:

This process is performed because the last chunk might have 1-byte.

REMOVE LAST CHUNK: PROCESS

1. Set “EOP: in CDE descriptor for second-last chunk.2. Set “SOP”, “EOP” and “Drop” for last chunk (chunk to be removed).CPPI/CP_ACE architectural parameters are listed next.1. CPPI streaming control length may have a maximum for ingress packetlength, e.g., some (power of two)-bytes) or other number of bytes.2. Regarding byte alignment, CPPI streaming control in some embodimentsmay have a desirable alignment (e.g., 8-bytes aligned).3. Within CP_ACE, PHP PS length may be established as, e.g., multiple of8-bytes. For PS Word, see FIG. 9.4. Egress CPPI streaming control+CPPI streaming status length may have amaximum, e.g., some (power of two)-bytes or other number of bytes.5. Egress CPPI streaming status may be established, e.g., as a multipleof 4-bytes. Notice this is different than internal PHP PS length of8-bytes aligned.6. CP_ACE outputs packet length as all-ones to CPPI DMA, therebyallowing CPPI DMA to count packet data length.

TABLE 31 explains CDE descriptor fields and mapping to Ingress CPPIstreaming descriptor from the viewpoint of the FIG. 13 PHP PDSP. TABLE31 also describes firmware processing for each of the fields. TABLE 31helps describe Descriptor Area of FIG. 9 as well as the otherfields/words/areas in FIG. 9.

TABLE 31 CDE AND INGRESS CPPI STREAMING DESCRIPTORS Value Set CDE by HWValid FW FW must word Field at Ingress at chunk access edit DescriptionWORD 0: In Word 0, Thread ID field is HW-allocated, valid at All chunks,and FW does not access nor edit. Thread ID chooses DMA channel onEgress. In Word 0, a CPPI Egress status length field is set at Ingressto CPPI streaming SW 2 “Status length”, if SW2 is not present then thisfield is set to zeroes. Valid at SOP chunk only. FW accesses to specifythe valid PS Data size for EOP chunk Trailer section. FW need not editthis field, which specifies the Valid PS Data Size that is included fromPS section of the EOP chunk. CPPI gets informed up-front with upcomingEgress status length. Egress status length is multiple of apredetermined number of bytes and specified in units of bytes as masterlength of status words and overrides any other PS (status) length.Further in Word 0, HW loads a Full Packet Length field with a value ofcomplete packet length as reported by Ingress CPPI streaming wordl PktLength. This is valid at SOP chunk only. FW does not access and need notedit this field. This field represents Total Reassembled packet lengthas informed by ingress CPPI DMA, which computes full packet size in itsegress flow. WORD 1: In Word 1, a Next Engine ID is loaded by HW fromCPPI Packet streaming SW0 Engine ID or from Interface Default registerif Use-Default is present in SW0 Engine ID. Valid at all chunks.Firmware accesses this Next Engine ID field to specify the next engineand edits this field if firmware is in the chunk path. Word 1 has aCommand label Info field. HW inserts CPPI Packet streaming SW0 CmdlblInfo valid at SOP Chunk only. Firmware specifies this command label infoand edits it if firmware is in the chunk path. Command label info ismade up of Command Label Present and Command Label Offset. Word 1 has aValid PS Data Size field. At Ingress, HW inserts Zeroes. Field is validat EOP chunk only. FW can access and change the Valid PS Size but mayomit to do so. This Valid PS Data Size value goes out as a form of CPPIstreaming status on EOP chunk. Further, Word 1 has a Physical PS Datasize field. HW loads a value, e.g. 32 valid for all chunks. FW does notaccess nor edit this field, which is a hole that is used by HW to inserta computed hash value. WORD 2: In Word 2, a Packet Type field is loadedby HW from CPPI streaming Word 0 Pkt Type, valid at SOP chunk only. FWdoes not access nor edit. A Word 2 field called Drop Bit is set by HW ifNo Payload is set in CPPI Packet streaming SW0. Valid for all chunks. FWcan access this field in case FW would like to drop current packet, butFW does not edit this field. FW can set Drop Bit in any chunk. HW takescare to abort complete packet. HW sets a Word 2 SOP Bit field upon firstchunk of packet, valid for SOP chunk only. FW can access this field incase FW is about to abort last chunk, but does not edit this indicatorof first chunk of packet. FW uses this field to decode first chunk. HWsets a Word 2 EOP bit field at last chunk of packet, field valid for EOPChunk only. FW can access this field in case FW is about to abort lastchunk, but does not edit this indicator of last chunk of packet. FW usesthis field to decode last chunk. HW sets a Word 2 PS Flags field withCPPI Packet streaming Word 0 PS Flags and the field is valid on allchunks. FW accesses to alter PS flags. This will change Error Code inCPPI descriptor. The last updated value goes out. A Word 2 Error Flagsfield is set by HW to Zeroes, valid on SOP chunk only. FW does notaccess nor edit Error Flags. Hardware engine (like Encryption) reportserror in this field. HW sets a Word 2 Source ID field to the CPPI Packetstreaming Word 0 Src-ID, valid on SOP chunk only. FW does not access noredit Error Flags. A Word 2 Flow Index is set by HW with CPPI Packetstreaming SW2 Flow Index, and if SW2 is not present then from MMR FlowIndex register. Valid on SOP chunk only. FW accesses this field andspecifies a new Flow Index if firmware is in the chunk path. CPPI Flowindex is used to select destination queue parameters. WORD 3: In Word 3,at ingress, HW sets a Control Data Size to CPPI Packet streaming PSlength, counted by Ingress module, valid on SOP chunk only. CDE enginechanges this value on Insert/Remove command. FW need not edit thisfield. CPPI PS data on ingress is used as CDE CTL data for PHP. Also inWord 3, at ingress, HW sets a Packet Data Size to Number of packet databytes packed in current chunk, valid on all chunks. CDE engine changesthis value on Insert/Remove command. FW need not edit this field.Ingress module packs up to 252 bytes of packet data into current chunk.WORD 4: In Word 4, at ingress, HW sets a Packet Id/Destination Tag toCPPI Packet streaming Word 2 Dst_Tag, valid on SOP chunk only. FW doesnot access nor edit this field. Packet ID is set by PA instead. Also inWord 4, at ingress, HW sets a Word Destination Queue Manager field tothe queue number represented by CPPI Packet streaming word SW2 DestQueue Num. If SW2 is not present, then the field is set to all 1's.Valid on SOP chunk only. FW accesses this field to specify this CPPIdestination queue info if FW is in the chunk path. TIMESTAMP: TheTimestamp word has a Timestamp field. On ingress, HW loads the Timestampfield with contents of a CPPI Packet streaming word Extended Packet InfoWord 0, valid on SOP chunk only. FW do not access nor alter this field.SOFTWARE DATA WORDS 0, 1: The Software Data Word 0 and 1 are loaded byHW on Ingress with contents of CPPI Packet streaming word ExtendedPacket Info Word 1 and 2 respectively, valid on SOP chunk only. FW canoptionally access this field to pass custom data to other peripherals.SW 0 word, SW 1 word are not altered by hardware. TRAILER SECTION WORDS:On ingress, HW loads Trailer section words (e.g. 8) with PS info Trailersection from CDE, valid on EOP chunk only. FW accesses optionally tochange trailer data. HW sends trailer (CDE PS info) as CPPI streamingstatus on Egress side. Trailer section (if present) in EOP chunk onlygoes as status. Trailer section of all other chunks is ignored and notaltered by hardware. CONTROL SECTION On ingress, HW loads the controlsection words (e.g. up to 16) from CPPI Packet streaming words called PSSection, valid on SOP chunk only. FW accesses optionally to changecontrol data. HW on egress automatically removed control data from startof control until end of current command label. PACKET DATA On ingress,HW loads the Packet data with CPPI Packet streaming word called PacketData, Valid on all chunks. FW accesses optionally to change packet data.HW packs maximum number of bytes, e.g. 252-bytes, in one chunk, to allowFW to bypass whole chunk if desired.

In FIG. 20, an embodiment improved as in the other Figures herein hasone or more video codecs implemented in IVA hardware, video codec3520.4, and/or otherwise appropriately to form more comprehensive systemand/or system-on-chip embodiments for larger device and systemembodiments. In FIG. 20, a system embodiment 3500 improved as in theother Figures has an MPU subsystem and the IVA subsystem, and DMA(Direct Memory Access) subsystems 3510.i. The MPU subsystem suitably hasone or more processors with CPUs such as RISC or CISC processors 2610,and having superscalar processor pipeline(s) with L1 and L2 caches. TheIVA subsystem has one or more programmable digital signal processors(DSPs), such as processors having single cycle multiply-accumulates forimage processing, video processing, and audio processing. IVA providesmulti-standard (H.264, H.263, AVS, MPEG4, WMV9, RealVideo®)encode/decode at D1 (720×480 pixels), and 720p MPEG4 decode, for someexamples. A video codec for IVA is improved for high speed and lowreal-estate impact as described in the other Figures herein. Alsointegrated are a 2D/3D graphics engine, a Mobile DDR Interface, andnumerous integrated peripherals as selected for a particular systemsolution.

Digital signal processor cores suitable for some embodiments in the IVAblock and video codec block may include a Texas Instruments TMS32055x™series digital signal processor with low power dissipation, and/orTMS320C6000 series and/or TMS320C64x™ series VLIW digital signalprocessor, and have the circuitry and processes of the FIGS. 1-19 and 22coupled with them as taught herein. For example, a 32-bit eight-way VLIW(Very Long Instruction Word) pipelined processor has a program fetchunit, instruction dispatch unit, an instruction decode unit, two datapaths and a register files for them. The data paths execute theinstructions. Each data path includes four functional units L, S, M, D,suffixed 1 or 2 for the respective data path. Control registers andlogic, test logic, interrupt logic, and emulation logic are alsoincluded. Plural pixel data is packed into each processor data word.Luma and chroma pixel data may be expressed in 8 bits and packed intoeach, e.g., 32-bit data word. The data processing apparatus includesmany instructions that operate in single instruction multiple data(SIMD) mode by separately considering plural parts of the processor dataword. For example, and ADD instruction can operate separately on four8-bit parts of the 32-bit data word by breaking the carry chain between8-bit sections. Various manipulation instructions and circuits for thepacked data are also provided. The IVA subsystem is suitably providedwith L1 and L2 caches, RAM and ROM, and hardware accelerators as desiredsuch as for motion estimation, variable length codec, and otherprocessing.

DMA (direct memory access) performs target accesses via target firewalls3522.i and 3512.i of FIG. 20 connected on interconnects 2640. A targetis a circuit block targeted or accessed by another circuit blockoperating as an initiator. In order to perform such accesses the DMAchannels in DMA subsystems 3510.i are programmed. Each DMA channelspecifies the source location of the Data to be transferred from aninitiator and the destination location of the Data for a target. SomeInitiators are MPU 2610, DSP DMA 3510.2, SDMA 3510.1, Universal SerialBus USB HS, virtual processor data read/write and instruction access,virtual system direct memory access, display 3510.4, DSP MMU (memorymanagement unit), camera 3510.3, and a secure debug access port toemulation block EMU for testing and debug (not to be confused withemulation prevention pattern insertion and removal).

Data exchange between a peripheral subsystem and a memory subsystem andgeneral system transactions from memory to memory are handled by theSystem SDMA 3510.1. Data exchanges within a DSP subsystem 3510.2 arehandled by the DSP DMA 3518.2. Data exchange to store camera capture ishandled using a Camera DMA 3518.3 in camera subsystem CAM 3510.3. TheCAM subsystem 3510.3 suitably handles one or two camera inputs of eitherserial or parallel data transfer types, and provides image capturehardware image pipeline and preview. Data exchange to refresh a displayis handled in a display subsystem 3510.4 using a DISP (display) DMA3518.4. This subsystem 3510.4, for instance, includes a dual outputthree layer display processor for 1xGraphics and 2xVideo, temporaldithering (turning pixels on and off to produce grays or intermediatecolors) and SDTV to QCIF video format and translation between othervideo format pairs. The Display block 3510.4 feeds an LCD (liquidcrystal display), plasma display, DLP™ display panel or DLP™ projectorsystem, using either a serial or parallel interface. Also televisionoutput TV and Amp provide CVBS or S-Video output and other televisionoutput types.

In FIG. 20, a hardware security architecture including SSM 2460propagates Mreqxxx qualifiers on the interconnect 3521 and 3534. The MPU2610 issues bus transactions and sets some qualifiers on Interconnect3521. SSM 2460 also provides one or more MreqSystem qualifiers. The bustransactions propagate through the L4 Interconnect 3534 and line 3538then reach a DMA Access Properties Firewall 3512.1. Transactions arecoupled to a DMA engine 3518.i in each subsystem 3510.i which supplies asubsystem-specific interrupt to the Interrupt Handler 2720. InterruptHandler 2720 is also fed one or more interrupts from Secure StateMachine SSM 2460 that performs security protection functions. InterruptHandler 2720 outputs interrupts for MPU 2610. In FIG. 20, firewallprotection by firewalls 3522.i is provided for various system blocks3520.i, such as GPMC (General Purpose Memory Controller) to Flash memory3520.1 for firmware and updates, ROM 3520.2 for firmware, on-chip RAM3520.3 for working run-time contexts and data, Video Codec 3520.4,WCDMA/HSDPA 3520.6, device-to-device SAD2D 3520.7 to Modem chip 1100,and a DSP 3520.8 and DSP DMA 3528.8. In some system embodiments, VideoCodec 3520.4 has codec embodiments as shown in the other Figures herein.A System Memory Interface SMS with SMS Firewall 3555 is coupled to SDRC3552.1 (External Memory Interface EMIF with SDRAM Refresh Controller)and to system SDRAM 3550 (Synchronous Dynamic Random Access Memory).

In FIG. 20, interconnect 3534 is also coupled to Control Module 2765 andFIG. 1 cryptographic accelerator CP_ACE 3540 (200) and PRCM 3570. Power,Reset and Clock Manager PCRM 3570 is coupled via L4 interconnect 3534 toPower IC circuitry in chip 1200, which supplies controllable supplyvoltages VDD1, VDD2, etc. PRCM 3570 is coupled to L4 Interconnect 3534and coupled to Control Module 2765. PRCM 3570 is coupled to a DMAFirewall 3512.1 to receive a Security Violation signal, if a securityviolation occurs, and to respond with a Cold or Warm Reset output. AlsoPRCM 3570 is coupled to the SSM 2460.

In FIG. 20, some embodiments have symmetric multiprocessing (SMP)core(s) such as RISC processor cores in the MPU subsystem. One of thecores is called the SMP core. A hardware (HW) supported securehypervisor runs at least on the SMP core. Linux SMP HLOS (high-leveloperating system) is symmetric across all cores and is chosen as themaster HLOS in some embodiments.

The embodiments are suitably employed in gateways, decoders, set topboxes, receivers for receiving satellite video, cable TV over copperlines or fiber, DSL (Digital subscriber line) video encoders anddecoders, television broadcasting and audio/video multicasting, opticaldisks and other storage media, encoders and decoders for video andmultimedia services over packet networks, in video teleconferencing, andvideo surveillance. Some embodiments, such as fed from videosurveillance sources, prepare numerous packet data streams for efficienttransmission for remote reception point(s). Some embodiments handlenumerous packet data streams for reception and distribution to multipleaudio/visual display locations over an extended user space. Someembodiments handle and integrate numerous incoming packet data streamsfor concurrent intelligible delivery to the user experience in a moreconfined space.

Accordingly, it is emphasized that, although FIG. 1 for convenience haslegends somewhat oriented toward the particular application of securityand cryptographic processing, subsystem 200 is also applicable orextendable to other forms of pipelined multiple packet-streamprocessing. In such other forms, for instance, processing contexts otherthan or additional to security contexts are handled by module 510. Also,any particular modules or engines 310, 320, 370, etc., suitably can havedifferent cores than, or additional cores beyond, the particular Cryptocores shown in the middle of FIGS. 10, 12 and 14. Various embodimentsare prepared as subsystems and/or systems for all applications to whichtheir advantages commend them now and in the future.

The system embodiments of and for FIG. 20 are also provided in acommunications system and implemented as various embodiments in any one,some or all of cellular mobile telephone and data handsets, a cellular(telephony and data) base station, a WLAN AP (wireless local areanetwork access point, IEEE 802.11 or otherwise), a Voice over WLANGateway with user video/voice over packet telephone, and a video/voiceenabled personal computer (PC) with another user video/voice over packettelephone, that communicate with each other. A camera CAM provides videopickup for a cell phone or other device to send over the internet toanother cell phone, personal digital assistant/personal entertainmentunit, gateway and/or set top box STB with television TV. Video storageand other storage, such as hard drive, flash drive, high density memory,and/or compact disk (CD) is provided for digital video recording (DVR)embodiments such as for delayed reproduction, transcoding, andretransmission of video to other handsets and other destinations. An STBembodiment includes a system interface, front end hardware, a framer, amultiplexer, a multi-stream bidirectional cable card (M-Card), and ademultiplexer. The STB includes a main processor(s), a transport packetparser, and a decoder, improved as taught herein and provided on aprinted circuit board (PCB), a printed wiring board (PWB), and/or in anintegrated circuit on a semiconductor substrate.

In FIG. 20, a Modem integrated circuit (IC) 1100 supports and provideswireless interfaces for any one or more of GSM, GPRS, EDGE, UMTS, andOFDMA/MIMO embodiments. Codecs for any or all of CDMA (Code DivisionMultiple Access), CDMA2000, and/or WCDMA (wideband CDMA or UMTS)wireless are provided, suitably with HSDPA/HSUPA (High Speed DownlinkPacket Access, High Speed Uplink Packet Access) (or 1×EV-DV, 1×EV-DO or3×EV-DV) data feature via an analog baseband chip and RF GSM/CDMA chipto a wireless antenna. Replication of blocks and antennas is provided ina cost-efficient manner to support MIMO OFDMA of some embodiments. Modem1100 also includes an television RF front end and demodulator for HDTVand DVB (Digital Video Broadcasting) to provide H.264 and otherpacketized compressed video/audio streams for Start Code detection,slice parsing, and entropy decoding by the circuits of the other Figuresherein. An audio block in an Analog/Power IC 1200 has audio I/O(input/output) circuits to a speaker, a microphone, and/or headphones asillustrated in FIG. 20. A touch screen interface is coupled to a touchscreen XY off-chip in some embodiments for display and control. Abattery provides power to mobile embodiments of the system and batterydata on suitably provided lines from the battery pack.

DLP™ display technology from Texas Instruments Incorporated is coupledto one or more imaging/video interfaces. A transparent organicsemiconductor display is provided on one or more windows of a vehicleand wirelessly or wireline-coupled to the video feed. WLAN and/or WiMaxintegrated circuit MAC (media access controller), PHY (physical layer)and AFE (analog front end) support streaming video over WLAN. A MIMO UWB(ultra wideband) MAC/PHY supports OFDM in 3-10 GHz UWB bands forcommunications in some embodiments. A digital video integrated circuitprovides television antenna tuning, antenna selection, filtering, RFinput stage for recovering video/audio and controls from a DVB station.

Various embodiments are thus used with one or more microprocessors, eachmicroprocessor having a pipeline, and selected from the group consistingof 1) reduced instruction set computing (RISC), 2) digital signalprocessing (DSP), 3) complex instruction set computing (CISC), 4)superscalar, 5) skewed pipelines, 6) in-order, 7) out-of-order, 8) verylong instruction word (VLIW), 9) single instruction multiple data(SIMD), 10) multiple instruction multiple data (MIMD), 11) multiple-coreusing any one or more of the foregoing, and 12) microcontrollerpipelines, control peripherals, and other micro-control blocks using anyone or more of the foregoing.

A packet-based communication system can be an electronic (wired orwireless) communication system or an optical communication system.

Various embodiments as described herein are manufactured in a processthat prepares RTL (register transfer language or hardware designlanguage HDL) and netlist for a particular design including circuits ofthe Figures herein in one or more integrated circuits or a system. Thedesign of the encoder and decoder and other hardware is verified insimulation electronically on the RTL and netlist. Verification checkscontents and timing of registers, operation of hardware circuits undervarious configurations, packet parsing, and data stream detection, bitoperations and encode and/or decode for H.264 and other video coded bitstreams, proper responses to Host and to MCE, real-time andnon-real-time operations and interrupts, responsiveness to transitionsthrough confidentiality modes and other modes, sleep/wakeup, and variousattack scenarios. When satisfactory, the verified design dataset andpattern generation dataset go to fabrication in a wafer fab andpackaging/assembly produces a resulting integrated circuit and tests itwith real time voice, video and data. Testing verifies operationsdirectly on first-silicon and production samples such as by using scanchain methodology on registers and other circuitry until satisfactorychips are obtained. A particular design and printed wiring board (PWB)of the system unit, has a video codec applications processor coupled toa modem, together with one or more peripherals coupled to the processorand a user interface coupled to the processor. A storage, such as SDRAMand Flash memory is coupled to the system and has VLC tables,configuration and parameters and a real-time operating system RTOS,image codec-related software such as for processor issuing Commands andInstructions as described elsewhere herein, public HLOS, protectedapplications (PPAs and PAs), and other supervisory software. Systemtesting tests operations of the integrated circuit(s) and system inactual application for efficiency and satisfactory operation of fixed ormobile video display for continuity of content, phone, e-mails/dataservice, web browsing, voice over packet, content player for continuityof content, camera/imaging, audio/video synchronization, and other suchoperation that is apparent to the human user and can be evaluated bysystem use. Also, various attack scenarios are applied. If furtherincreased efficiency is called for, parameter(s) are reconfigured forfurther testing. Adjusted parameter(s) are loaded into the Flash memoryor otherwise, components are assembled on PWB to produce resultingsystem units.

The packet filtering described herein facilitates operations in RISC(reduced instruction set computing), CISC (complex instruction setcomputing), DSP (digital signal processors), microcontrollers, PC(personal computer) main microprocessors, math coprocessors, VLIW (verylong instruction word), SIMD (single instruction multiple data) and MIMD(multiple instruction multiple data) processors and coprocessors ascores or standalone integrated circuits, and in other integratedcircuits and arrays.

The cryptographic accelerator CPE_ACE is useful in other types ofintegrated circuits such as ASICs (application specific integratedcircuits) and gate arrays and to all circuits to which the advantages ofthe improvements described herein commend their use.

Turning to FIGS. 21-22, an assembler is created to compile and assembleMCE assembly code to machine code. The assembler is written, forinstance in Perl, for the MCE architecture to efficiently convert theMCE assembly code to optimized machine code.

The assembly instructions follow a specific syntax format. Each field inthe instruction is separated by comma. Lines that start with # arecomments and will not be processed. The decimal number at the leftmostcolumn is shown in this example only for reference. The MCE Assemblerallows user to specify one of three starting points: SOP, MOP and EOP byadding a corresponding label at the front of the starting section.

In FIG. 21, a process of creating the MCE instructions involves thefollowing steps for example: Based on the mode and algorithmspecification as input, the mode operations are converted to logicaloperations in MCE instruction format. The logical operations areconverted into machine code using the MCE assembler, and finallysimulated in hardware to verify the output.

In FIG. 22, an MCE example is described here for GCM (Galois-CounterMode) to provide confidentiality and authentication in IPSEC. GCMinvolves two main functions: block cipher encryption which typicallyuses AES algorithm and a Galois multiplication procedure. GCM deliverstwo outputs: encrypted text (ciphertext) and an authentication tag. Theauthenticated encryption operation is shown in FIG. 22 wherein the Eknotation denotes the block cipher encryption using the key K, ‘multH’denotes a Galois multiplication by the hash key H, and “incr” denotes acounter increment operation.

Implementation of the FIG. 22 GCM operation using MCE assembly code isshown below as TABLE 32. Refer to TABLE 13 for instruction descriptioncorresponding to assembly code entries of the type MCE_<InstructionName>. The instruction line numbers in TABLE 32 are correlated toenumeration boxes marked on FIG. 22. The instruction line numbers inTABLE 32 also represent locations of 12-bit machine coded instruction inthe Instruction Array of FIG. 11 that are decoded by Decode block andexecuted by Execute block in the FIG. 11 MCE. Regarding TABLE 32 Aux1,2, 3, 4, see also TABLE 11 and FIG. 10 blocks for Context Controller andContext Update and FIG. 11 path 570, 640, 620, 660, 570. For Plaintext,see also FIG. 9, and FIG. 10 In Packer and FIG. 11 path 260, 650, 620.PROC instructions (TABLE 32, TABLE 13) call cores in FIG. 10 that returnoutputs, such as Ek or multH of FIG. 22, and thereafter see FIG. 10 OutPacker and FIG. 11 output delivery path 620, 670, 260.

TABLE 32 MCE ASSEMBLY CODE EXAMPLE FOR GCM # Aux1[255:128] = hash key H(used in Galois multiplication) # Aux1[127:0 ] = Len(A) ∥ Len(C) # Aux2= Reg1 = AAD Additional Authenticated Data # Aux3 = Reg2 = {IV, CTR} #Aux4 = Reg3 −> Ek(counter0) operation # Plaintext = Reg0 −> loaded inevery round (each round is 16-byte). # **** First block of input use thefollowing operations ***** # Process counter0 using AES, store it inAux4  1 MCE_PROC, MISC_AESKEY_128, CORE_AES_KEY_KEYIN, REG2  2 MCE_WAIT,REG3, SRC2_ZERO, SRC1_DFC # Process the AAD and store the result in R2(Aux3)  3 MCE_PROC, MISC_00, CORE_GM_KEY_AUXIN, REG1  4 MCE_WAIT, REG1,SRC2_ZERO, SRC1_DFC # **** Round 2 and later use the followingoperations *****  5 MCE_INC, REG2, 000, REG2  6 MCE_PROC_MASK,MISC_AESKEY_128, CORE_AES_KEY_KEYIN, REG2  7 MCE_WAIT, REG0, REG0,SRC1_DFC_XOR_SRC2  8 MCE_XOR, REG1, REG0, REG1  9 MCE_PROC, MISC_00,CORE_GM_KEY_AUXIN, REG1 10 MCE_JUMP, 01100, IF_EOP # Only does thefollowing if this is NOT the last round. 11 MCE_OUTSET, REG2,DATAOUT_DFC REG0 12 MCE_WOUT, REG1, SRC2_ZERO, SRC1_DFC # The jumpinstruction above goes to here if this is the last round (EOP). 13MCE_WAIT, REG2, SRC2_AUX1_LOWER, SRC1_DFC_XOR_SRC2 14 MCE_PROC, MISC_00,CORE_GM_KEY_AUXIN, REG2 15 MCE_OUTSET, REG0, DATAOUT DFC XOR_WOUT_SRC2,REG0 16 MCE_WOUT, REG0, REG3, SRC1_DFC

In the assembly code above, sixteen assembly instructions realize GCMmode. Since the operations for the first round differ from the laterrounds, the offsets are specified as: start of packet (SOP) offset=0,middle of packet (MOP) and end of packet (EOP) offset=4. That meansinstruction number 1 (MCE_PROC) through 12 (MCE_WOUT) executessequentially in the first round. In the second and later round,instruction number 5 (MCE_INC) through instruction number 12 (MCE_WOUT)executes sequentially. However, when instruction number 10 (MCE_JUMP) isencountered and when this round is the last round, it will skipinstructions 11 and 12 and jump to instruction 13 (MCE_WAIT) andcontinue until instruction #16 (MCE_WOUT). The output of the Perlassembler is a sequence of a number of machine-code instructions inbinary form and equal in number to the number of instructions listed inthe assembly code like that listed above, each machine-code instructionincluding its opcode and its bit-fields Field2, 1, 0.

Mode Control Engine MCE of FIG. 11 provides a significant advantage inflexibility and control to program a sequence of operations throughuncomplicated software. MCE can thus implement any mode that uses acipher core inside the encryption engine. Thus, the same securityaccelerator is re-usable in devices with different securityrequirements.

Moreover, MCE (mode control engine) can add or support new cryptographicoperational modes in the field by changing the micro-instructions,thereby adjusting the hardware at run-time to support new modes at highperformance in native hardware.

Since the MCE instructions are devised specifically for cryptographicmode processing in this example, MCE delivers high performance and addslow or little overhead over the native cryptographic processing (AES,3DES etc.) cores together with which cores MCE processes its modeoperations. The cryptographic engine using MCE occupies much smallerarea compared to hardware cores respectively dedicated for each mode anduseless for the other modes.

In addition to inventive structures, devices, apparatus and systems,processes are represented and described using any and all of the blockdiagrams, logic diagrams, and flow diagrams herein. Block diagram blocksare used to represent both structures as understood by those of ordinaryskill in the art as well as process steps and portions of process flows.Similarly, logic elements in the diagrams represent both electronicstructures and process steps and portions of process flows. Flow diagramsymbols herein represent process steps and portions of process flows insoftware and hardware embodiments as well as portions of structure invarious embodiments of the invention.

ASPECTS (See Notes paragraph at end of this Aspects section.)

1A. The electronic circuit claimed in claim 1 wherein said securitycontext cache module includes a data lookup cache portion and a securitycontext cache portion, and arbitrated port controllers coupled to saiddata lookup cache portion and to said security context cache portion.

1B. The electronic circuit claimed in claim 1 further comprising asecurity context cache module that is operable on a demand basis tofetch and later evict a respective control data structure for eachsecurity context.

1B1. The electronic circuit claimed in claim 1B further comprising anexternal memory coupled with said host processor, and at least one suchcontrol data structure holding a cryptographic key and a cryptographicmode indication from said external memory.

1C. The electronic circuit claimed in claim 1 wherein said control planeengine is operable to programmably organize a logical topology of dataplane engines.

1C1. The electronic circuit claimed in claim 1C wherein under suchtopology, buffers are re-arranged into programmably-specifiedoperational order to establish a particular process.

1C2. The electronic circuit claimed in claim 1C further comprising amultiple-buffer circuit having multiple inputs and outputs wherein thelogical topology includes a selectable sequence of couplings formed insaid multiple-buffer circuit for at least two of said engines.

1C3. The electronic circuit claimed in claim 1C further comprisingingress streaming interfaces and egress streaming interfaces, saidingress streaming interfaces operable so that multiple packet flowsstream into said ingress streaming interfaces, and said ingressstreaming interfaces are coupled to the logical topology forapproximately concurrent data flow and processing that in turn supplyrespective output data streams to said egress streaming interfaces.

1D. The electronic circuit claimed in claim 1 wherein said subsystem isadaptive by allowing firmware controlled security header processing andhardware-driven, any-order data staging, cipher block formatting andcryptographic processing.

1E. The electronic circuit claimed in claim 1 wherein processes can bein one or in plural security contexts.

1F. The electronic circuit claimed in claim 1 further comprising acontext cache, and wherein a sequence order for the processing by saidengines is established by the at least one said control plane engineusing information in at least the context cache.

1G. The electronic circuit claimed in claim 1 wherein said data-planeengine and said control-plane engine are together operationally scalableto provide more processing for additional data streams.

1H. The electronic circuit claimed in claim 1 wherein said data-planeengine and said control-plane engine are together operable foranti-replay protection against a replay attack.

1J. The electronic circuit claimed in claim 1 further comprising achunking circuit operable to store at least some input packets assmaller chunks and responsive to a quality-of-service (QoS) input byswitching within a packet to schedule the data chunks for processing bysaid data-plane engine based on the QoS input, whereby the response tothe QoS input is made more swiftly effective.

1K. The electronic circuit claimed in claim 1 further comprising anexternal memory and a context cache, said context cache operable tofetch and evict a control data structure from and to said externalmemory, and wherein said data-plane engine and said control-plane engineare together operable to cryptographically process the control datastructure to safeguard at least part of said control data structure inthe external memory.

1L. The electronic circuit claimed in claim 1 wherein the at least onesaid data-plane engine has functional units, the at least one saidcontrol-plane engine further operable to configurably establish any of aplurality of different effectively-coiled sequences of and selected fromsaid functional units and said control-plane engine.

1M. The electronic circuit claimed in claim 1 wherein said hostprocessor has a host memory and is operable to store a key and controlstructure in said host memory, and the at least one said data planeengine and control plane engine are operable to access the key andcontrol structure to encrypt and decrypt such key, provideconnection-specific control flags, anti-replay windows, and firmwareparameters, and establish static connection values (nonce/salt).

1N. The electronic circuit claimed in claim 1 further comprising aconfiguration circuit coupled with the least one said control-planeengine, and a public key accelerator module coupled to saidconfiguration circuit, and wherein said host processor is operable tostore configuration data in said configuration circuit.

1N1. The electronic circuit claimed in claim 1N further comprising arandom number generator module coupled to said configuration circuit.

1P. The electronic circuit claimed in claim 1 further comprising ascheduler having inputs and outputs operable to selectively couple saidengines in an operational sequence, and a block manager module coupledto said scheduler circuit.

3A. The electronic circuit claimed in claim 3 wherein hardware-driven,any-order data staging is thereby effectuated.

22A. The security context cache module claimed in claim 22 furthercomprising a request-fetch circuit operable to fetch at least anothersecurity context and associate each such other security context with aningress packet, as and when requested by the host processor.

23A. The security context cache module claimed in claim 23 wherein saideviction circuit is responsive to control flags to indicate astart-of-packet, to force-evict, and to force teardown of a securitycontext.

25A. The security context cache module claimed in claim 25 furthercomprising a logic circuit for setting and resetting an ownership bitfor host processor control or local processor control.

27A. The streaming interface claimed in claim 27 wherein said controlcircuit is operable to lock a high-speed connection.

27B. The streaming interface claimed in claim 27 further comprising ascatter-gather direct memory access circuit coupled with said controlcircuit.

29A. The streaming interface claimed in claim 27 wherein said controllogic is operable to execute tear down only after all buffered packetsfrom the packet stream are processed.

30A. The control method claimed in claim 30 wherein the host-loading,supplying, operating, and processing involve plural contexts.

30B. The control method claimed in claim 30 wherein at least one packetin the stream of packets includes an identification of a particularcryptographic process, and the processing includes responding to theidentification of the particular cryptographic process to generate a setof engine identifications ordered in a particular order to specify theprocessing topology, and processing the stream of packets using a set ofengines in the subsystem operated in a pipeline order represented by theset of ordered engine identifications, whereby to effectuate theparticular cryptographic process.

30C. The control method claimed in claim 30 further comprising storingat least some individual packets in chunks of a packet, and wherein thecontrol data in the context includes at least one offset, and saidprocessing includes selectively applying the offset to access differentparts of the program instructions to process a chunk depending on aposition in its packet from which the chunk is stored.

30D. The control method claimed in claim 30 wherein the access to thecontext includes loading a copy of the context from the first storagearea into the subsystem.

30E. The control method claimed in claim 96 wherein the operatingincludes using an ownership flag to transfer ownership of the stream tothe subsystem.

39A. The communication method claimed in claim 39 wherein the at leastone command includes a Pass1 engine identification and a Pass2 engineidentification wherein Pass1 and Pass2 can be used in any order if asame hardware engine is not used twice in the flow including a processselected from the group consisting of 1) AUTH (Pass2)→ENCR (Pass1), and2) AUTH(Pass1)→ENCR(Pass2), and when instead a same hardware engine isused for both Encryption and Authentication then second pass uses Pass2engine identification.

39A1. The communication method claimed in claim 39 wherein one of saidengines is an air cipher engine operable for both Kasumi-authenticationand Kasumi-encryption for inbound flow, and Kasumi-authentication usesPass1 engine identification, and Kasumi-encryption uses Pass2 engineidentification.

42A. The electronic buffering circuit claimed in claim 42 furthercomprising configuration bus, configuration registers, saidconfiguration registers coupled with at least one of said processors.

43A. The electronic buffering circuit claimed in claim 43 furthercomprising a direct memory access (DMA) circuit and at least twoadditional buffers respectively coupling said two ingress interfacecircuits to said selection circuit.

43B. The electronic buffering circuit claimed in claim 43 furthercomprising a storage for packet information, said storage coupled withsaid ingress interface circuits.

Notes about Aspects above: Aspects are paragraphs which might be offeredas claims in patent prosecution. The above dependently-written Aspectshave leading digits and internal dependency designations to indicate theclaims or aspects to which they pertain. Aspects having no internaldependency designations have leading digits and alphanumerics toindicate the position in the ordering of claims at which they might besituated if offered as claims in prosecution.

Processing circuitry comprehends digital, analog and mixed signal(digital/analog) integrated circuits, ASIC circuits, PALs, PLAs,decoders, memories, and programmable and nonprogrammable processors,microcontrollers and other circuitry. Internal and external couplingsand connections can be ohmic, capacitive, inductive, photonic, anddirect or indirect via intervening circuits or otherwise as desirable.Process diagrams herein are representative of flow diagrams foroperations of any embodiments whether of hardware, software, orfirmware, and processes of manufacture thereof. Flow diagrams and blockdiagrams are each interpretable as representing structure and/orprocess. While this invention has been described with reference toillustrative embodiments, this description is not to be construed in alimiting sense. Various modifications and combinations of theillustrative embodiments, as well as other embodiments of the inventionmay be made. The terms including, includes, having, has, with, orvariants thereof are used in the detailed description and/or the claimsto denote non-exhaustive inclusion in a manner similar to the termcomprising. The appended claims and their equivalents should beinterpreted to cover any such embodiments, modifications, andembodiments as fall within the scope of the invention.

What is claimed is:
 1. A packet-processing electronic subsystemcomprising: (a) a first data interface having an input for acceptingfirst streaming data, an encryption input, and a first streaming output;(b) a second data interface having an input for accepting secondstreaming data and having an output; (c) a third data interface havingan output for egress of third streaming data and having an input: (d) afourth data interface having an output for egress of fourth streamingdata and having an input, the first, second, third, and fourth datainterfaces being separate from one another; (e) scheduler circuitryhaving a first streaming input, a second streaming input coupled to theoutput of the second interface, having outputs coupled to the inputs ofthe third, and fourth data interfaces, and including a packet memory,the scheduler circuitry having a security context cache interface, andan encryption input; (f) a security context cache coupled to thesecurity context cache interface of the scheduler circuitry andincluding a cache controller and cache storage for a security context,the security context cache on a demand basis fetching and later evictinga control data structure for the security context; (g) an encryptionmodule coupled to the encryption interface of the scheduler circuitry,the encryption module including control circuitry and encryptionaccelerators responding to the security context in the cache storage,and having an encryption output; (h) first buffer circuitry having aninput coupled to the first streaming output and an output coupled to thefirst streaming input; and (i) second buffer circuitry having an inputcoupled to the encryption output and an output coupled to the encryptioninput.
 2. The subsystem of claim 1 in which the first interface is apacket accelerator ingress Communication Processor Peripheral Interface(CPPI) streaming interface.
 3. The subsystem of claim 1 in which thesecond interface is a code division multiple access (CDMA) ingressCommunication Processor Peripheral Interface (CPPI) streaming interface.4. The subsystem of claim 1 in which the third interface is a packetaccelerator egress Communication Processor Peripheral Interface (CPPI)streaming interface.
 5. The subsystem of claim 1 in which the fourthinterface is a code division multiple access (CDMA) egress CommunicationProcessor Peripheral Interface (CPPI) streaming interface.